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(57)Abstract: 

PROBLEM TO BE SOLVED: To provide an effective 
method and decoder providing predicting a motion vector 
MV to a macro block MB in a bidirectionally predicted 
video object plane B-VOR. 

SOLUTION: Prediction in the direct mode is conducted 
for a macro block subjected to field prediction of a 
future anchor image 440 and a B-VOP micro block 420 
in common arrangement by calculating four field mobile 
vectors (MVf.top, MVf,bot, MVB.top, MVb.bot). The four 
field mobile vectors and their reference fields are 
decided from (1) an offset period (MVD) of a coded 
vector of a current macro block, (2) two future anchor 
image field mobile vectors (MVtop, MVbot), (3) reference 
fields 405,410 used by two field mobile vectors in the 
future anchor macro block arranged in common, and (4) 
a time interval (TRB,top, TRb,bot, TRD.top, TRD,bot) in a 
field period and between the current B-VOP field and 
the anchor field. 
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(57) [Sffl] (fcjE*) 
[KJH] B-VOPrttf>MBlc**U &W)^<* hA> Off) ^S'J 

**OT^*— Wf«440(O7^— /vK^iB 
h/V- (MV f>top , MV ffbot , MV 

t7ty« (mvd) , (2) loo^ory^-i 

t7^;vKf»-<^ h/v (MV top , MV bot ) , (3) * 

405, 410, *5<tt/ (4) ^fiE(DB-V0P7>f — /vKirT^* 
- 7 -f — K i: tf> ID, 7 ^ -/v KJB Pfl K33 it 5 B#P*1 HQ Pi 

(TR Bttop . TR^, TR D toPt Tr Dibot ) ^feftJfeStb 
5 c ^^D.top" 
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iSStfM v>£> h ^ 7"^ «t f A 7 ^ K^-O 
*x£, bZ?>(DTMt, ^(D^f *—i?(D 9 ttfc~TZ>7 
rirtcit), SftW^-^hy7 p *5<ti;#hA7>f 

^ — 5?Ob^7^ — K Sr^ffli*" 5 fc » (D7 * 17 — K 
»*K<* hA% MV fwtop tta; f (MV top *TRB >top ) /TR 
D.tap + MV D lcLfc^oT^:S^tt, ^^T% TR^tta 

[lf*JS3] »#«2fcfB*tD#fe^fcoT, MV f . top 
fi, if o^fp]^ h^^-v'a ^Irtolt^S?: 

[M*g| 4 1 2 * fete 3 »cffi*o*S-efc o 

i-6fc*a>'7*!7— K^»^<^ h/V, MV ff5ot tt, SC (M 
Vbot*TR B>top ) /TRD.bot + MV D tdLfc^oT, 



fitop ti, if uco^[^]— co f7^>3 y^toif(7) 

7 ] IS*« 5 Sfctt 6 teIB**>*ifc-efco 

"C. ^.bot* 5 i^TRatbottt, Wiaa&oV >r — ^ K 

8 ] if 1 ftV> L 7 <D^-T ix*»fclB«G>* 

(a) MV Bftop = ( (TR B , tof TR D . top ) *MV top ) /T 
RD.top» (W MV B<top =MV f>top -MV top cO-^iCb 

^©hy^7^-/UKi, MV top tc£ 9g2f<>: £;}x£i& 

omottnftttmntcttftu Mv f(top ^^£o^^ — 

(a) ft, ^/V^^lb-<^ h/W MV D =0Ot tlcStRS 
ti, BUlB^: (b) tt, W^OtoktmSlZtiZ, k^^> 

(a) MY b . bot = ( (TR B , bot -TR Dtbot ) *MV bot ) /T 
Rd^,, isiV (b) MVb.^MVf.bot-MV^O-^^L 

TR Dtbot ^**co^^— ^(D#hA7^- /v-Ki:, M 

v?Osl? h ^7 >f KS:TIt5fc6©7* 7- 

M#5ioi:Et©*S"Cfcot, fu 
IS* (a) tt, T^v^liK^ b/u, MV D =0O^#lc31^ 
Six, ttllE^ (b) tt, MV D ^0Oir#31^$ix6, i:^ 

T, h7^fiJ:Uf#hA(07-{-/vKS:tt5, 




(3) 



^-07**7 — K-^ff, SAD forward . field ^^-r^ 

TMt> **(Di?-7^Dyny^ 17- K=<- 

^^-r^>ite*t^^5-^^^r7-K^, SAD bacward 

If, SAD average , field Sr^:S-r5Xm^, lulSSADOft 
/h lc L as o T ItJlS =i— Klfc-^— KSr l«t & IS t , 

$>oT, SAD forTOdtfield «, (a) T&3z(Dmm<D^>C 

th teit/ (b) aioi!ii©w^n^y^©*h 
[«*3g i 5 ] is*^ i 2 sfctt i 3 KHBfc<D#»-e 

fooT, SAD backward , field H, (a) **(&S!pO^*f 
th *S±tf (b) 3t$k<Dg-m<D'v( ?u7u*y?<D#h 

6] W#4l 2 ft LI Scov^^L^cfa 
t^ST'fcoT, SAD average . field ^, (a) oS^fc 

Kfc^5*6#»ft^<z>^th *3J:tf (b) iflSfci 

h AO^^— /VK§r^i"3ig*<£>:7^— K=» — K 

*i-«**O f 7*— /VK = — WbLfcSlp-^p^n 
v 9 *$Lm LT^*faK:^8!l£;ft,5 t CI <5tf>^* n7n 




teM^l 1-7519 



K»»^h/v f Mv top , ^btficsnaaiohy^is 

y ^fe J;tf# b A©7 ^ < tt-oSr^SI 

i-z>tc#>, y*v<- Yi&Sin^vty— k»«k<^ h/v 

SfeKlSC (MV top *TR B , top ) /TR D , top +MV D KLfc#oT 

d^d -/^<d h y-?y ^KSrwiii-afc 

s/^^-^Kt, TR top f^«t DS^P^^n^ii*^^ 

ft t«il***<0>f ^ ^ ^ ^ — ^ K i: , MV top tc 
<k "9S2St Zti&i&^CD^ri? d^p ^/^CD7^— Kt 

MV f>top tt, f pO^^) fy^-^a y^rtolg: 

[111*42 0] »*4l 8*fcttl 9(Cf5«(D^3- 
^fcoT, TR B , top ^iOTR D>top ^, fulS^O^^ 

si" K * # h ^ 7 ^ —/v K-e fc 5 if 5 A* S: 

[l#*^2 1] '7^VnL2 OOV^tU^fCffi 

f(^f^-^$)oT, ^ (MV bot *TR B>top ) /TRu f bot + 

— /uKSrf'jBS-^Sfcft, 7*17— K^»^<^ h/K MV 

■7^P7'o5'^^hA7>f-;i/KJ:, MV^^CctDS 

?<D# hA7^ — /VKi:, MV bot {d<fc DS^t $tv^>i§ 

I8»*42 2] B#5 2 l^|E«<07* = -y-e*>o 
f, MV fitop tt, * n 0*[p]^(0 h 7 a y S: fco 

SSfcO^SSrttffiLTft^^tb, MV top ^it/MV bot flS 

[!f*S2 3] W*9[2 l*fctt2 2tfB«^=a — ^ 
/vk = — KlbLfc-^ p/p y^*»tD»c h — 




(4) 



[ff*3®2 4] IR&Sl 7fcV^L2 3©v^fii*l:E 
gBr^/tfcot, 3£ (a) MV Wt0 = ( (TR B>top -T 
Ratop) *MV top ) /TRp.top, ^.tU 5 (b) MV b , t0P =MV 
f.top-MV top ro— Jj\Z Lfc^ct, ^.^CO-?^ n y n y 
^ 05 1- s> 7*7 -f — K Sr^zffl-f 5 fc £>, ^ 17— K3£ 

Rb. topfi^^^"^ * * a -s/fathi/ —A* K t , 
MV top iCt 0 £H5ili*<Z>-^ o 7'oy^©7-f 

-/W K t ©Mro^FllB^raRI^^ L, TR D top «*5(5 
OY^D/ny^rofy^-f-^Ki, MV top (C±ip 
SSPt SftSia^tfJ^nT/n ~ y $ <n>y 4 —/V K t ©IS] 

©^raw^ROPBtcssfjsu, Mv f . t0P ttiaft£D^^ b7"b 

^bK^A-^KK^ h/VMV D =0©i; tiul25£ (a) £51 
MV D #0co^^«tilS^: (b) 

[fit*S2 6] !f#m&^L9<^-f;ft</5M-fB«<D 
f^-n-fcot, (a) MV Kbot = ( (TR B-bot -T 
R D .bot) *MV bot ) /TR D .bot. (b) MV b>bot =MV 

f.bot-MVbotro— 7jIcL^:*5cT, glft©^* n 

It^^ h/v, Mv b-bot £:&s-r5^l§:£"g-^, t 
— a- k t ©iig©B*rattftraiHi£*tjfc l, tr^h** 

©v^n7D7^©#hA7^-/i'Ki, MV^dJ; 0 
SSP t £ frSi®*©-^ ^B7"D7?fl7^ — ;v Ft© n 
©BtHftftWIBI-attSL, MV ( , bot «:^©-^^o7*n 

[B*«2 7] ffi*S2 6tciE*co-7*3-y-efoo 
T, ^kf-^A-^RK^ YA<, MV p =0©i:^^BUlES: 

(a) 4ra«i-s*«, i3j:rm D ^o©t*iwea; a>) 

[000 1] 

^9 h:/U— ^ (B-BOP) ©<fc 5 1trr 9 S9 
/Vff^^^i' (Stic, B-VOPfcitf/^fcttB-VOPSr 

[0 0 0 2] 

[S£*©&fl5] *»M«tW»c**S'*/j:^'A'?-^f f wr 

t U-C*a^iitf3:ifcIS0/IEC/JTCl/SC29/WGll N1796F*3© 
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Jg£* MPEG-4 Video Verification Model Version 8.0' 
Uhj/^*/^, 1997#7£) lCfE®£*VC<^5, MP 
EG Verification Model (VM) 8. QMW&fe (MPEG-4 VM 
8. 0) t K&m* h £. MPEG-2SI i£8i*&teMPEG-4^*#!*& 

XMISO/IEC 13818-2®i&* Information Technology- G 
eneric Coding of Moving Pictures and Associated Au 
dio, Recommendation H. 262'1994#3^ 25 B iCffi^ $ ft. 

[0 0 0 3] MPEG-4 (i, |iftii7V-i>!7-^ MlttC 

^SVS^fcfe©^- K-ffc^— /K05!— h£r 

i£©Hft*HSS:3tg-r5o MPEG-4©^#C#:7 U-A!7- 

^^-t^ *;>b (-r#fc>*>, TvxtmiH) ssks ( 

[00 04] MPEG-4te-v/V^ T3g^"Cro tT^^T* 
ix*:fai:*saflfSr^-*.S. MPEG-4f±»*«J*EB(8, 

[0 0 0 5] MPEG-4 t'-r"^- VM =1 — #/? = — ^ (code 

©A/f7 , y S 'K3-m5. s^t>7*n5/^»i&a« 

CT) »c J; t) n-%<t $ ix5„ ^-T'v 5 ^ ^ h ftt^7/V7 r 
Ty7'i:Ltg$jt, rt»<— ^JI««F*flS(CAE)rA' 

a-;?^77-f 5'i?'r-^]^tt5^-7'^W h (sprites) 4: 1 

[0 0 0 6] ^ttttfc-SftfcT 1 **^-* — «>=»— KflS** 

^W)*i^*3 J;UJffi«(ME/MC) 4 f>Uf^-*5E(2-D)SIB3E 
&S:^t? «*«Ma-C*> o. ME/MCX VffllB*««5 S ft 

ft, tt«H4«)«»©t>i-c©3ivhntr— = — KflsasJ: 

5. ME/MCK*H-S*t--j|a:«jfi«|ftt7*py^^'y9 L V 
^-efc"9, jRtt'!CffilW3ElftHcDCT-C*)ofc. 
[0 0 0 7] 

?n7D-^ (MB) dS^r^g#-T V^-w— ^ = 
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ftsjh/cv^ssjp-f *—*s«:ifua'r&b$, mb^me/mc 

[0 0 0 8] 4*fc, B-VOPrt<DMBfc#U WKk<* h/V- 
(MV) ^®l^^^5Sbm»^R^S*^TVN5o *fc 

= — T-r ^* h tz tb-rmm* * 
[0 0 0 9] 

KlTWUfclf^**^*^ hTBB (VOP) (CjbMtS, 

[ooio] «at, —moTiff/ir 

/Htfco^n^n^ (MB) co±5ft, 3ME0>, 
T, K©IMHk<* bA- (MV) 

— ^te, MV top , -r 
MVdSflufBifi*^ ^ — \?<D Y y "77 ^ —/V Kd*# h 7 

ftigcD®^MBS:^tf 0 
[0 0 11] ^<ow\X -ttt*a*<?5>f ^ — ^ (fci:* 
ff, Wtli^^^!7-K) -to ^89 

^T^7^!7— K) IC^5^T*, "forward (;7;*-T7 — 
K) " MY£^5 D IBS^fttLT, ^«0>;frrttt f ttfc 

[0012] n«nc 9 MV^, i-*fo-fc**o-f ^ 

•ftt^SrSJpt-ra. 7 jt7— \*5o£Tf/<y ? 17— KMV 
b y 7^*5 ± l// $ tt # b A 7 w K Sr^aiJi" 5 * 

[0013] Mv f#top , i-^^^^ffio-r^— 



^ b/W*5C, MV f , top = (MV top *TR B>top ) /TR D , top +MV D 

^Oh^-f^Ki:, MV top K±!9g!gi 

TR D>top ^*5fe^^— hy^7-f- /V'K 
A^Kk©IH©B»IIB»4IIBI»K»jSi-5. B»mAftHQMI 

[0014] m«t(c, Mv f(bot , i-^^^^so-r^- 

h/Utt, MV f(bot = (MV^TRb.^) /TR D , bot + 

^ — v^co^^ — /i. k h oiB©«FlBtt*IBIW»c 
^tiSt, TR Dib0t tt**O>f ^#bA7^- /lxK 

[0 0 1 5] MV B>top , -T^tP^^ffiCOMB^ b V?7 4 

MVb.top= ( (TR B . top -TR D . top ) *MV top ) /TR D top 
(yvV*3£SK<^ h/K MV D =0Ot#) , *fc{iMV bttop 
=MV f , top -MV t0P (MV D ^0Ot#) lafJSotftSSJl 

[0 0 16] MV b>bot , ttt>^^$E<Dm<Oi$yj±7 4 

^» MVb.bot= ( (TR B . bot -TR D(bot ) *MV bot ) /TR Dtbot 

(^/V^^fi)^^ hyw, MVD=0O^#) , £/tteMV bfbot 
=MV f>bot -MV bot (MV D ^0O^r#) iafc#ot, Jfeft* 

[0 0 17]^MiJ:tT, — a^^^/utr 
— Ktt, /<y^l7— K*— K S*MBfiB#P^ 

[ooi8]**ftfi, i&3k<Dmmm (7*y—y^— 

M^^-(D^^!7-K^tf, SAD foTOd , field ^^ 
■r^)XS?r^tp 0 SAD fonnrdi£itld tt, ig^OS^pMBV^ 
^ ft 5 *ig ?iil^MB i cDMB i: O FpT OjS^/u 

^^c^:^^^^ 0 SAD bacW . d>field H, **coS!?MB 




(6) 



\Z Vt Z> *5S * 53 ^MB £ §l£E ©Iffi © BB tf> IS * A- S ^ 

SAD averaB ., £itld tt, a*«J:U**©*!pMB«>*a* 
[0 0 2 0] ^ — Kftrt— KttSADO*/h»CLfc*o-C 

[00 2 1] SAD foiTCd , field , SAD^^fiex,, *5 
i SAD averagfit f ltld tt, h <y 7**3 X Xftf h A CO 7 4 - 

[0 0 2 2] 

ttzfisx.? 1*7*1/— > (B-BOP) (CioV^T-^^n^u 

(MB) CD J: J^^^/^tf-r^^— MB 
*3 X tz HUBfc a - KflrT S * KiflMI $ tbS 

^5J:t/j&^7^-/vKlc^LT, ^JB^ft^ h/v 
(PMV) ^51^-r5*fe^ffi^i-5 e K*5B 
Ri- S ft KiSKWH IB ©ft tc^q 7^-;vK 

[00 2 3] 01 5 tf^^- • *zf*J= * h 

-7>u—> (VOP) Kft*5J:t«I*fk©^o-fe^Sr 
[H^f^o 71/- A 1052* £o(Dl§xl/^ i/ b (jEJj 
»«J*^v^^M07, SfcRfuJ^i/T* V M08, te± 

V M07£:«U V0P118^SffR^U^ >- M08«r«U 
VOPU9#UJ^0>jMR^i/^ > M09$r*i-<t 5 t-ir^^ 

£ 0 VOPttffifcOTgttfc^rU VOP^ai^stf^^-^v? 

3^V0PT?fc£fc%;ifc*L<5o LfcdSoT, "VOP" O/B 

[0 0 2 4] 7 1/- A105*5<tr/7V — .A115d>k<DV0P 

^^ttfflBu<o«-g-fbati»c«»s*ts. vopii 

7, 118*5±mi9dS, m^^ — ^137, 1 38*51/ <fc 1/139 CD 




mmw-i i-75i9i 



-KflsSJtft. y^^y-r— K-fbi it>K, dct<d 

[0 0 2 5] ^ - Kffc £ ftfcVOP-r — * JUWc, 
/W45^t?fc5e^fcft^, -^1^7*1/^ (MUX) 14 
0-C££^£*u<5o :W:d*ot, t*— #ttlB«BE(*:± 

-e/u^l^if (DEMUX) tcii O^BISix, #HtLfcV0P 
117-118tta#ftS*t, [Eim^tb^o 7 1/— -M55, 165 
fciVnSrt:, V0P117, 118*5<tt/11936^n-Ptt, 

y v — 170 t^r >-^— 7x^^tgsn3y^7^i 

[0 0 2 6] 3^#i?y^tt f if— Oi^tciH®^ 

ysi-fj y — 170^, gffibfcvopta^s, atu^^ 

UfcVOP178 H») Sr^tfii^-CtSo 

if— fi, P3^OV0P178Sr]E^OV0P117^et^x.^:7 
W— A185^r^^T5r tm5p Ifc^ot, 7 
l/— A185tt, g«LfcV0P118, 119 1, 
fcVOP178tSr-S"tf e 
[ 0 0 2 7] ft&CD^Jt LTI4, W*^V0P1095:, ^-— if 

Sr, xx?is*<D£ 5^*J:^"b^filUfcV0P(D 

j; p^=i— KftLTt £V\, 3-— if— tt, ^^7^^y- 
170j&>&, Xtt«HK^ft««o^-f ^^(DJ: 5 ft« 

if-^, ^*H«*-Cfc***OJ: 5lcK)f£T-#5 0 
[0 0 2 8] ^7**7^^7 13 — 170H, ^>-^/M45 
S:^bTgfl$ixSVOPt«#^Sri36S-e# f 

h©i54*y h7 — ^Sr^LTVOPJHfi 

tT^^-ir^v/a ^H, *— CDVOPX^VOPOv— <^>-^^ 

[0 0 2 9] [g| KDt'fWyx^ K03— KteXtf 

fiGHWft, ^77^^;v-^-f-->f^-7x 
tfx^S, >fy^*y 1- ' T7 P y <5r-^ iXa V 

Stt f KflsSixfc 7>r-A-K^— K) VO 

P^toME/MCcDtg^jfi, i 9*tftlB^S:#*.5o 
[0 0 3 0] 0 214, *%KlcLfc3&s 0 fcJi>3— y© 
/py^0t'fc5e xyn^^ij:, ^Wa— KflSVOP (P 



(7) 



1-75191 



-vop) s.t5s*iRj=- Ktevop (b-vop) <omJ?k<o&m 

[0 0 3 1 ] P-V0Pf2, ^Vh77 A-=E~ KXtt^f 
— 7 is — A • ^e— K«rttJll*T<B*fc=»— KfbSti/ 
(MB) *r£*»5o ^^h7 
;7 — A (INTRA) Oa-KftifctC, ^0^0^ 
* (MB) tt, ftfeOMB^S^iritTtC, a— KflsSftS. 
-O^-^U— A (INTER) a— Kfbfc t fctC, MBft 

Z7ls-J±\zMlsXmfr#)fc=3-Wk£thZ> 0 T>#- 
7 V — J* VOP) ft P-VOP (B-VOP-Ctt* 

vn) -cftfftLtfftfe4v\ i-V0Pds^fflUT = — KfbS 

[0 0 3 2] 7^-17— K^WiirfcK, 5H3E<Z>MBft T 

^SrftSt5o 5»»^<^ b/U (MV) #s ft&Z> 

®^CDMB^^^^cOMBcDtg^{i^^-r e 

»Kj*ft«ft 16X16^MBT*H^<, 8X8(7)/P5/ 

7 Is— J±(D=i~- KfbbfcP-VOP(Z>MB0>P)#3&s, 

[0 0 3 3] B-VOPft /< y * 17- K^ffl, 2^1*3^ 

SO, feiifEIK*— K (:n^T^y^-7^Ag 
«-?$>£) ftbift, P-V0PlcHSbT±&Lfc±5ft 
7*!7— K^W^e— KS:ttffl-e*5. B-VOPft MPEG-4 
MV8. 0 C0TT% >f ^ F77V- A=3— KfbLfcMBSr^tt 
ttflJLfcVvSS, - IxftaElbfcLfciSJo 7^-71/- 

a (fet^tf, vop) ft p~vop (B-vop-cteftv^) -c* 

[0 0 3 4] B-VOP0V* 5/ ^ 17- K^ffl'J ttt^, 
(DMBft ^raWf-BU^fc^TV^ — 7 A£>MBff>1>— 

ftMVtf (^^-!7— KMVir L~C£q bt^TV^) , *&GDiS 
-frOMBfcSH"5aft©MBO*B»ieffi.*r*-t-o B-V0Ptf>2X 

$ — 7U—*>RTf1%r$$)fcWi<Tlsj?--7 W'-AOU 

Z7 — Kftl**^? — K»»MV34S, ft&tf>3S^£>MB 
K^5»SEOMBOffi»aBtt4:*i-« 

at^a-&toMBd^#e>ix5o 

[0 0 3 5] «: tc^< P-VOP \zWffi\£thfr^iru7 u y 
?mx8<DT h*/<>* b^ffllHe- K4:ft«t5t, B-VO 
POa^^—K^SOi: t fcfc, 8X80>^o >y?<Dtz#><D 

Jtui-fcft©*— ^Sr^Kfc-a-f t-, b-vopot/o^ 

<Otz#><D&W)^* h/ls%m<1tlibl l Z, P-V0?(D8X8<D7 



[0 0 3 6] ^^OOT—^tC^^tbTVN^zcvrr — 
ytt, «4fc3-^210i f »»«*»tB220i: f 
^#6230^:, t"^^^^ — 3 — ^240 i:S:-&^ f £-*ft 
^^W^—^AAS:*— S^^205-eSE«i-6 o » 
»*63fe»ffi220, ^S5M«^fg230, — ^ 

240XtW£:R — ^210tt* fc, MPEG-40V*5* — VO 
P_of_arbitrary_shape~<£> <fc 5 fcVOPJBttflMBA* t> * — 

#, VGPttft;fr»<&»iR«:*U Lfc^ox, — 

[0 0 3 7] S«ric*tL*:TV*— V0P«tB250dS, ^K) 
«^ffi220atJ^»«lflt«IB230Jc: i otttfflf^/: 
<^f¥«/&£;ftfcT:/;*; — V0P«:-£x.5 D SlftOVOPtf*, ^ 
^^^-r — a — ^240"C3iV3— K{bSttfc9» (resid 
ue) Sr#^5te*Jc f •9-^h9^#260-C»SM*«Six 

40tt, DCTtCj:»9, -^/V^^W^I^ (MUX) 28(HC7^* 

^^r-«« (^y^tf, m&$m ^^^$-^5 0 

^-^r — 3 — ^240f4*fc f S«^^tLfctucDTV^-VOP 
«tg250-A^-r^>fca6co, JP»«270T?»»*««230 

[0 0 3 8] #tttff# (ixtf, ^«J-<^ h/w) Ht, » 
lb*i««IB220^feMUX280^i:^*fett i VOPOJgttS:* 
■TJMttlHlHt, = — Hb«tB210^e>MUX280^^ 

btv5 0 Mux280fi, 5 #a-fbLfc5*— b y- 

[0 0 3 9] ai^33— ^AASftSiB*^— ^f^, YU 
V 4 : 2 : 07*-^y b^r^i-^o VOPtt, 

[0 0 4 0] ^W*0-9— y-Ofcft^WBJjS 

^^-To (ME/MC) tt f -JIS^, 

y?) tS97^^t-fS«i:i,57 f o^^ 
^py^f4l9©7U-Arti:fc5o MJjfa^m (B) 3 

m7*y*<omtiL&, fem^th/l- (MV) T*fo^, ^tb 
ft (x) XWSitt (y) M^tt5 0 MV/ilWfrOjE 

[0 0 4 1]»»i«LfcS^oy^jftS l ^SiJLfc^ 
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5^i#Tt5. ME/MCO^n y^tt, 16X16^^ V— 

[0042] Mva>«£tt:, i/2B*tcia:^sixSo «ra 

(i+x, j+y)te, xXVy^j^LT^ISSft, SE«R0>¥#-C 

a, B, c&tflrc^i-J; 
5 0 l/2B5*Ofit«tt, a, b, cXtWC^i- £ o &R"C 
«"t". ^roir^D, a=A, b=(A+B)//2, c=(A+C)//2, 
SOT>=(A+B+C+D)//4T*fct), r^T% V/'ttA* 6>*t5 
#J0JI«:*1\ if^PSBfi, BtjfB^MPEG-4 VM8. 0, ft 
feme u 4ls9 — X'—x£tl1ZTV*;l'¥T*m<D¥y* 

arjr^v^z h^mn&wamfeRzmflt" tm-rz, r. 

Hii?08/897, 847 (1997^7 J 21 B tiB JE) (rOJtltr 

16Xl6<7>jS*OS,^E(75'V^t3b r n (MB) #5, 

ME/MC3— K{k$ix6-<#^, 8x8<D®lif ^EBo^y o 



[0 0 4 4] 7^- /V Ktf>16Xl6<D-^ n^n ^ ^ (M 
B) 2*, ^600T^^tbTV^5o MBIl, fSgfc#g07^ 
>-602, 604, 606, 608, 610, 612, 614, 616 1, 
S ^9-^^603, 605, 607, 609, 611, 613, 615, 617 £ 

-y-/sti, (x«, wiax/ss 

2) ^K*r*iven»jati-5. 

[0 0 4 5] v^OO^ilf*?^ vri^— ^ — 

^645T^£ft£^EPte, 9>f ^602-6170??£<£>3£T* 
#X.£^lh 0 ^J^«, lffi600<&»l#S<D9-< ^"efc5 
«afc*aO7^^602tt f MB650C0^1#iO^^ V-Ct 
«»#B©9*f V604tt, MB650tf>Jg2#g<O^ 

<D ?-<^606 f 608, 610, 612, 614, 616tt f -tft-eft, 
MB650tD^3#S36^^8#g(O7-f ^iftSJ: 9 i-H 

g(D^>T^603, 605, 607, 609, 611, 613, 615, 617 
tt, 16x8^««685«r»*rs. 
[0 04 6] ?~V0?(Dtifr(DUC^— KSra^i-5fc*<7> 

^E— Kfc*tU *-cD16xi6<D:/n ^ (#)x. 

fcf, SAD 16 (MV x ,MV y )) S.VE9o(D8X80^n^^ (^Jx. 
tf, SAD 8 (MV xl ,MV yl ), SAD 8 (MV l2 ,MY y2 ), SAD 8 (MV x3 ,MV 
y3 ), &tfSAD 8 (MV x4 ,MV y4 )) (DtcZXDmitimftft (SA 
D) 4r#5 0 TIB^*10»* f 8x8^^a04ril^L, 
*ft£Wtti, 16xl6^^a)^il^-1-^ 0 
[0 0 4 7] 



If ^$AD*(MV^MV^)<SAD^(MV x ,MV y )-\29 , 



[004 8] 4 y^-^t/cifftMit, 



top v 



(MV^, MV T _ top ), XT/ SADfco^OlV; 



i — bottom* 



Vowt-w ^r-bottJ I*. bs'/ ("«*) 
^ (bottom) 7 4— ^ KcDfcifcGSSlti^ h/VT*fc^>„ 

SAD top S.^SAD bottOT ©fci6cO) 

(a) SADJMV X ,MVJ , (b) (MV^.MV^) + 129 , 



[004 9] ^-8!]^— mS©±SSI4, «T©S;2©* 
sad /hofcroSrS&i-sr i: tcS^V^TVS. S;2ro (a) 
as&/h-efe5$§-g-, 16x 16(0^8!!^^ $ft5o *2co 
(b) 8X8»^*)«« (7K^ 

h^W*— K) ^ffi^^ttS, S2® (c) asft/h-cfc 

"65" J*Nb/4+ld»P>»5>tt5. 
1&2) 



and ( c ) SAD, a JMV^ . MV y _ lop .) + SAD Muni (MV xJ>etKm , MV y _ bo!tum ) + 65 
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X80^p y?(D&* ICMVa*— o^5ofc£>) o fc\z, ~ 



tcfr<Di/2mmm^<ni/i6mmm<D&&$:7jki- 0 

tf, 0^fe2/l6^o»cA»6>ix, 3/l6^fel3/l6dSl/2»cA 
»?>*l, l4/16aWl5/l6*S2/2=lKAfefeJx5 0 
1^1] 



l/16M»flg 0 2 3 4 5 6 7 8 9 10 11 12 13 14 15 



0 0 0 1 1 1 1 1 1 



i 



[0 0 5 1 ] 7 4 — /V K-^W-Ctt, HO016X80^n 

(#Jx.fcf, ^-T^602, 604, 606, 608, 610, 612, 61 
4, 616) tt, W36*HfcSHS7^-/vK«:tt«i-5 by 

lc, 7i/-AoII(:^Jti5 0 l/2Wi*«>ailOEriR] 

[0 0 5 2] Zlo<D^ u ^T-^-^yu y?<Dti£><DW 
£ 0 zkTdt^tt. ^T^^tt*l/2tli*(D^^iry h 

h^jft^tt, n<btiz>? v ^-r^xwcom; 

SEftfittt, ^-^)7^-;vK(Z)7^>'Of|oiS^(D 

[0 0 5 3] T hVO'* b^»S«0*2^*«tt > /l- 
;t^7p s/^^fcft^MCSrafc'&to*^, MPEG-4 V 

[00 5 4] B-V0PKSM-5#*O=2— Kffcft*** ^ £ 
X+mm£thZ> 0 B-V0Pc^i 5 WINTER = — KfcLTVOPtC 

^— K, *5 J: 157* 17 — K^r— K* J *>5. *&*<7>Hote 

V^o B-VOPO^aOLfcT'D y^te#^e— KlCfctLT^ 
7^-yn^^D^l/ r >^ (tatf7 1/- 

[0 0 5 5] -oOB-V0P^, **5^E— 
fc«*6MB4:to^4:iS-e#5 0 "B-V0P" i^7/§lS 

^L, :o:^(^^4^o WflRflSK:, P-V0P*5<fc 



v\> 

[0 0 5 6] ^iEa^- KB-V0P MBtc^t IT, Wtt££ 

7*7— KMVfcStL, 4t>^^7- K*JJ:W»* 

HmKDMB "RU,*^:^ ^MV (fci;ifi, 7*7 — K* 
K) ttt\^f4 9 9 — t LTtt^SiX 

SMVSrEtttL, ffi^jfci-SiBJHMrWJSfc 

[0 0 5 7] M£*^7^MV£«fl!U fi3SaHff#?^ 

*»6>T"C*6fcflE*L-C i £<^8$Sr6MV(D7*7-K 
MVteB-V0PO?a£tf>MBGD7*7 — KMVKl*tU /Uf^f 

y ? 7 — Fmtt-VQP<Dl%.tE<Z>lB<D'<y ^ 17— KMVMStf 
U 7°U-7^^-£ LT^ffi^tV^o iMECDMB^MVte 

tfcfc, 7 p uf^^^-^lt, ft#«fctt-*Hfc;S*L 

[0 0 5 8] aft©lffi*SV0P(D2ci8fcflrfli-r51i^ f H 
ft^MB^^f-t-^r^^ ^ f4* a [:^^^5o 

[0 0 5 9] ^fy^-V^ • 3— K>fbLfcB-V0P{^ 
V^T, h jy <t^# h A 7 ^ KO-t Jx-PtLtt, 

h y-?7 4— )\s Y7*V— Kfcitf#hA7>f-;w 
K7^-I7— K, 4bW:ft07y*- MB<7> h y-?7 4 — 
)VY7irV— Kio±tf#hA7>f-;VK7t7- KSrS 
"To ma£^MB*5±V7^l7— KMB, *3<tV/4fcttS*E 
<75MB*5<t^<y^l7— KMBf4, J&^EcOMBCOME/MC^ — K 
fkfc*tUTfltfflSix4V\ B-V0Pf4INTRA^— K{k LfcMB 
^tf, B-V0P^)#MBJ4ME/MC^- K-fkSixSr t left 
5 0 7^7— KJSi^/^^^ !7— K7^*- MBteP-VOP 
4fcttI-V0P^e>-efcQJ»T, 7 i/-AJf:fj:7^;v 

[0 0 6 0] ^fy^- ^Ufc, KCOB-V0 



(10) 
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ptc^u m^xD-zimte^m&W)^? (pmv) 

[0 0 6 1] 
[S2] 

&2 

b-y7*7^-/uK, /<>>^7-K 2 

F-A7^ — ^ K. dlSL^-ZlzJl 3 
[S3] 

ii 

7U— a. 7*7— K 0 



7 V — A; /^-y ^ 7 — K 



7^-;vK, 7*7— K 


0, 1 


7^ — /UK, /*<y^7— K 


2. 3 



0. 1, 2, 3 



[0 0 6 2] fcir^tf, *3te7*7-K^-K(*:£: 
tU&> "7>f-;vK, 7*7— K") £r*>o, 9ME<£>7 
>r —/u K^e— KMBtc*f LT, b v 7" 7 4 —A- K7t7- 
K ( "0" ) Soil/sK h^7>f — — K 

( "o M ) h^y^jff-fr&mtsthZo 

(DmcD&W)^* hMZG&MFfV, fKMBlC^LPMVt 

d^-CfcSo 7^^ ttSfcflC**— K©MBK» 

[0 0 6 4] B-V0PO|tjR^~ MkTfcoT 
84 



KJ3 J:tf/<2/ * 7 — KMVte, «fPp1WJC*^P-V0P mbcomv 
— oO^/v^MVtcJ: 9S|g£rfcoT, v^— ^U: 

[0 0 6 5] «4tt, PMV^^it/^^MB^^^iC 

£<5v^fc^j£oB-V0P<^»K<* b^£=> — K{trT6fc 
ag>fc«[«S*t6ri:*r»#aLTS^-. B-V0Plc*tL, ^ 
SOS*-** bA-, pmv[ ]<oyi|36S4h*.feix, i?n^*>H 
O (?c£:;Ui, pmv[0], pmvtl], pmv [23 # <fc t/pmv [3] ) 

^ f;W:it^ot^nt*l:, pmv[ l^yfj' 
^^Sr8t*-e#So B-V0P£=3— KftLfcfHC, PMV-<* 

£ S^HffSitSo o, -o*yt«29o(DPMV 

[0 0 6 6] fctitf, 7*7— K, 7 ^ — K^SO L 
/cMV^noo^Sb^^ b/u^r^-r^o pmv to] 

hy7 P 7^-;VK, 7*7 — K^^UPMVCfoD, pmv 
[0]|j:«fA7^-;vK, 7*7 — KK#LPMV-efc£ 0 
/<y^7 — K, 7>f — /V K^iffi LfcMBtd*tUT, pmv [2] 
teb^T^-f — /VbV<y^7 — Kfc*tLPMV-Cfc£ 0 ^ 
*fR], 7>f — /UK^fflLfcMBlcStfb, pmv[0]«h^^7 
W — K7t 7— KtcSJUTPMVCfc 9 , pmv[l3te# b 
y,< 7 * 7 — V\Z& LPMVC fc 9 , pmv [2] 
/VK'^^7— KKStLPMV-efcO, pmv 
[33 VA7^ —/V K'* y ^ 7 — Kfc» LPMVT-fc 5 0 
7*7 — Y%*L\%'<v$y— Kf»L*:7U- A^e— K 
B-V0PMBK#U of— cDMV^fc 9 , pmv[0]tO^2$7* 17 
- Kfc# Uffiffl S ix, pmv [2] y * 17- Kfc# LTffi 

^e— KB-VOPMBtdSJ L, -OCOMV,' t?i^7t7- KM 
Vfc*M-5pmv[0] *5 iU 5 ^^^ 7— KMVKJJi-Spmv[2] 
^*So WWJfeUfc "M*f-f-5fcfecOpmv[ ]" ft, - 

[0 0 6 7] 
[84] 



its 



7*17 

-K7 
u — A 

^e-K 



7-K 
7 U— A 

^e-K 



7 U A 

^-K 



7*7 
-K7^ 
— /v K 
^e-K 



7— K 7-< — 

7^-— /U K 

K ^e— K 
^— K 







^(DDmvrOl non 0,1 


2,3 0.1.2.3 0.1 2.3 0.1.2.3 






^>CDDmv[0] non 0 


2 0.2 0,1 2.3 0,1.2.3 



[0 0 6 81 *42*, <Mc, &&<omKtt1rZ>^mw$: S^-T-<<, *55Ka)S«S:lltTi-Sfc«>^>iiEafB"t? 
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[0 0 6 9] INTRA^p y^DCajS^fiiWS, dct_typeO 
" to fe-f , MPEG-4MV8. 0\zm W £ tlX V * 5 CO t (H C 

y^*»6> = tf— Six5-<t i#£B^T, MPEG-4MV8. 0 

\z®,m£ti>r\,^z>£?izm'n£tiZo z<Djikm±, dctt 

ype^^fiE^MBfc J: ULLIS^ n y * fcfcT L T 13 fc 
ot#©*«rtBkft5, dct_type^S#S£:, AC^SO 

t^:5o £i^njy^sftv^#, if PdSAC^S!l 

[0 0 7 0] H 4 tt*»HC Lfcj»Sot-f y 
• =* — K*fcbfcB-VOP<D hy^7-f- /u Koacaf^E— K 
<o = — K{b«rH*t-5o 8MSE©ICi:RIC«#»4ffi« 

*S, (1) 16X16 (!7 U — A) MB, (2) -O'h^MB**: 
tt (3) 8X8 (JfcfrMB) JD£o\z=*— Kfls«S*t*i:# 
tctt, SjE^^o^y? (MB) C#Lt f 

Ht^e- Ks&sflWB 
[0 0 7 1 ] ag^- KT4»tt f *^{4g-3tt Lfc5fefc 

<o#— v — * • wkLtim\zM-tz>\£&*: 

T8U*tC, ^aiMBS:«JSfci-5. S*IrI7 ^ —A* K&lbtt 
{JtLfcMB hy7 p 7>f-;VK7^7-K, # 

— K, *5 <£ Xf^ YJ±7 ^ Y'* y&V— K) 
7^-/VK^ft^h^ (MV) J3, **£>T>';*7 — PHfe 

[0 0 7 2] toS«tt, jKKfcUfc*— 

[0 0 7 3] 3H5E£>MB<£> h^7 P 7^ — /V- K»C*hj- S^SI 
tt, *^7y^/-Iffi (CtUS, MV=0-CI-VOP*fcttP 

5*^7 — ltfg><D f >y ~fy J —A- KMBfiifi*0>T 
— ®<fefl> h y^7^- /uKSfctt* h A7^- A K 




mz y %L$E<Dm<Dhy7"7*<*- ^KWt-57y*- MB 
* hA-MVp^x.^*, MBSfil^i IPMBJCSoV^T*^ 

[0 0 7 4] mftcOMBO/K h A^^— J\sh*\zM-tZ>&W) 

5) *3fe<Z>T:ot7— MB<£>, »a:LTtt«-3ltfeixfc#b 

[0 0 7 5] S^ftfc, h y-?-7<4 — AK-^ hAtt, 
(a) ^bTffi«<5ttUfc*3K<^r^— MB^h^y^ 
:7*— /uK*»6>»*:W*i: (b) *t^LTffiS<5Jtbfc 

^tb-<^ h^fi, (a) »J«:LTffi«<3^Lfc**or 
MB(D# fA7>f — /WK*»6#fcHi«4: (b) *tJ^ 
LT{±S<5tt btbtztf f^7>f — /V KMVtdJ: 9 mm t L 

[0 0 7 6] g|4^^Lfc < t 5M, ^SCOB-VOP MV420 

ia*^T^*— VOP MB400tth^^7-f — 
±V#hA7>f— ^K405Sr-&*. *^7^- V0P M 
B440tt 4 —;V K450*5 «t U 5 ^ b A7>f —/V K445 

[0 0 7 7] Mv top ii, t&^<dt^jj— mmcoMM^M 

^MB§:^i-**OT — MB440CO h ^^7^^ K450 

^*ti-5 7^r7— k»»^<^ h/vx*fc5 0 Mv top te5fc£> 

VOP400JC §|L T^IB«3{C 7*17— KT*fc50T% -t 

Wi7^7 - KMV-efc^o :©teKh^7>f- 

*t»53ftS, MV top tt3S*OT^*^MB40000/KhA^^ 
— K405SrS2pi-t-S o MV f , top teSiifc^MB<£> h 

;VK^)7*7- KMB-Cfcfl, MV B> top (i!ifficDMB^ 

*V-p*LRIftSix6*3fe*3j:Va*<Z>T^* r *J<D 

[0 0 7 8] Fy^^^ /uKJc#i-6»«K<* h/utt 
MV D =0(Ot #, 

MV f . top = (TR B . top *MV lop ) /TR Dt top +MVD 
MV D . top = ( (TR B , tor TR D . top ) *MV top ) /TR D> x 
MV D ^0Oi: #, 
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M^b. top" (MV ft top MV top ) o 

tt#fc, ?4-;\< KB**— Kfc» LTP-VOP-efo^o 
19, 16xi6^o^Usfi/^ia»^— Ktttt^StuS. TR 

£§I3E<E>B-V0P420 <Z> h y ^ — /u K430 £ 
(^BB^-7-<— yvKO^BBWWH-CfcSo TR D(top te, ifl 

[0.0 7 9] M 5 ft, ffliat^oNy^w 

^ . =3— K>fbLfcB-V0P<D7tf hA7>f — A*- K^mgpe— 

J;t*BI5J;i*3ttTV^ 0 El 4 t IBI«©W##f*SftT 

(MB) 440<O7K>A7>f — yWK445^*J"t-5 7*l7 

—/V K410j&># h A^^— K405>5^^i*;h^a s £2Pi 

/^K405S:S3S£:i-£o 
[0 0 8 0] #bA7^-;vKi:»t^i^ 



MV D =0<£>£#, 
MVf.bot= (MV) 



B .bot*MV bot ) /TR Dibot +MV D 



.MVb.bot= ( (MRB.bot-MRD.bot) *MV bot ) /TR D>I 
MV D *0<Di:#, 
MV b(bot = (MV f>bot -MV top ) . 

5) t, 3g,^E<7>B-YOP420<D# b J*425 k <DWi<Df$K6tot£f& 

-f— /VK405) t, h AXip:7^— ^K445i£> 

[0 0 8 1] 0 4:&£tf5^#Hc:fc^T, TR^, TR 



D, top* 



TRB.b 



TR, 



D. bot 



[0 0 8 2] 

TR D>top *fe^TR D(bot =2* (TR furre -TR past ) + 6, 
T^ttpifcttTR^tat^ (TR current -TR past ) +6 



TRfurre' TRcu: 



T*fc 9 , 7 -f -A- KW^^m»*MIW^^#30«*ttiE 

[0 0 8 3] fcir^tf, »-^«l^*»«(D?!lO»3e 
"1" f***OT^^7 — :7^ — /VW h y7 p 7-<-;VK 
"Cfc t» , &m t4of:7^ — A' KjW* hA7^ —/v Kt? 

fc5ci:^to :^): £teH4i;:^£;h,Tv^6o ffi^ 

"1" tt, *3fe07y*-7>f- ;VK«fA7^- a- 

[0 0 8 4] 
[S5] 





— ;v K 










fy^^ *bA7^ 




















0 -1 


0 


1 






0 0 


0 


0 






1 -1 


-1 


1 






1 0 


-1 


0 



[0 0 8 5] SWMMea— KfcOfcftR, 1S43-K 
tc, B-V0P{c»br, MBtt (1) ES*~K, (2) tt^ 
W*-K*:*tf) , (3) 7^-/WK^ib*« 



[0 0 8 6] B-V0PO7 >f -/V- K^»tt«tUfcMBlc:*f L 
r, a-^-ftLfcr^^-Pi^tciHLT, */h©/u5-^> 
^. l/21i*SAD(cS<3 < 7*17— K, ^ y ^ !7 — K, * 



• 
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(1) SAD direct +bl 

(2) SAD forard +b2 

(3) SAD^^+bS 

(4) SAD aYerase +b4 

(5) SAD fonrard , field +b3 

(6) SAD D8(:kwM . t field +b3 

(7) SAD Bverege . field +b4 

mz-^te, as*— k, k»»^», 

■?u7\s V *y-7) %zi.XS-7 4—)VY=t— K (-rft*>*>, 
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1. Title of Invention 

PREDICTION AND CODING OF BI-DIRECTIONALLY PREDICTED VIDEO 
OBJECT PLANES FOR INTERLACED DIGITAL VIDEO 

2. Claims 

1. A method for calculating direct mode 
mot-ion vectors for a current bi-directionally 
predicted, field coded image having top and bottom 
fields, in a sequence of digital video images, 
comprising the steps of: 

determining a past field coded reference image 
having top and bottom fields, and a future field 
ceded reference image having top and bottom fields ; 

wherein the future image is predicted using the 
past image such that MV_ op , a forward motion vector 
of the top field of the future image, references one 
of the top and bottom fields of the past image, and 
MV b3r , a forward motion vector cf the bottom field of 
the future image, references one of the top and 
bottom fields of said past image; and 

determining forward and backward motion vectors for 
predicting at least one of the top and bottom fields 
of the current image by scaling the forward motion 
vector of the corresponding field of the future 
image . 

2. The method of claim 1, wherein: 

MV f cop , the forward motion vector for predicting 
the top field cf the current image is determined 
according to the expression <MV top *TP. 9i top ) /TR D , top + 
MV D ; 

where TR g>c:)p corresponds to a temporal spacing 
between the top field of the current image and the 
field of the past image which is referenced by MV too , 
TRo^p corresponds to a temporal spacing between the 

/ 
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top field of the fu:ure image and the field of the 
past image which is referenced by MV rop , and MV D is a 
delta motion vector. 

3. The method of claim 2, wherein: 

MVf.top is determined using integer division with 
truncation toward zero; and 

MV ccp and MV boc are integer half-luma pel motion 
vectors . 

4. The method of claim 2 or* 3 , wherein: 
TR B top and TRo (Cop incorporate a temporal 

cox-rection which accounts for whether said current- 
field coded image is top field first cr boztom field 
first. 

5. The method of one of the preceding claims, 
wherein: 

MVf.boc tiie forward, motion vector for predicting 
I, the bottom field cf the. current image is determined 

according to the expression (MV bot *TR Bf:boc ) /TR D , bot + 
MV D ; 

where TR Dboc corresponds to a temporal spacing 
between the bottom field of the curren: image and 
the field of the pasr image which is referenced by 
MVfcoc / T2. D ibot corresponds to a temporal spacing 
between the bottom field of the future image and the 
field of the past image which is referenced by r*v boc , 
and MV D is a delta motion vector. 
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6. The method of claim 5, wherein: 

MV £#boc is determined using integer division with 
truncation tcward zero; and 

MV top and MV fcot are integer half-luma pel motion 
vectors . 

7. The method of claim 5 or 6, wherein: 
boz S-^d TR Dbcc incorporate a temporal 

correction which accounts for whether said current 
field coded image is top field first or bottom field 
first. 

8. The method of one of the preceding claims, 
wherein : 

M^b, top> t ^ Le backward mot ion . vector for 
predicting the top field of the current image is 
determined according to one of the equations (a) 
MVb. top = t (TR 3 . cop ~TR Dr ._ Qp )*MV rq? )/TR^ cap and (b) MV^ top = . 

where TR^^p corresponds to a temporal spacing 
between the top field of the current image and the 
field of the past image which is referenced by MV cop/ 
TRq cop corresponds to a temporal spacing between the 
top field of the future image and the field of the 
past image which is referenced by MV top , and MV f>top is 
the forward motion vector for predicting the top 
field of the current image. 

9. The method of claim 8, wherein: 
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said equation (a) i3 selected when a delta 
motion vector MV D =o, and said equation (b) is 

selected when MV D * 0. 

10. The method of one of the preceding claims, 
wherein : 

M^b.bot' the backward motion vector for 
predicting the bottom field of' the 'current image is 
determined according to one of the equations (a) 

MV&.bot = < <TR B#bot -TR 3<b0£ ) *MV boc ) /™o.bot and (b) MV b , bo:: = 
MV f . baE - MV bot ; 

where TR 3i ^ corresponds to a temporal spacing 
between the bottom field of the current image and 
the field of the past image which is referenced by 
MV b oc TR DboC corresponds to a temporal spacing 
between the bottom field of the future image and the 
field of the past image which is referenced by MV bot , 
and MV l boz is the forward motion vector for 
predicting the bottom field of the current image. 

11. The method of claim 10, wherein: 
said equation (a) is selected when a del" a 

motion vector MV D =0, and said equation (b) is 

selected when MV D * 0 . 

12 . A method for selecting a coding mode for a 
current predicted, field coded macroblock having top 
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and bottom fields, in a sequence of digital video 
images, comprising the steps of: 

determining a forward sum of absolute 
differences error, SAD fo ^ ard , ficld for the current 
mac rob lock relative to a past reference mac rob lock, 
which corresponds to a forward coding mode ,- 

determining; a backward sum of absolute 
differences error, SAD bacitwardf f ield for the current 
-nan rob lock relative to a future reference 
macroblock, which corresponds to a backward coding 
mode ; 

determining an average sum of absolute 
differences error, SAD aver&seifield for the current ( 
macroblock relative to an average of said past and 
future reference macrcblocks , which corresponds to 
an average coding mode ; and 

selecting said ceding mode according to the 
minimum of said SADs . 

13. The method cf claim 12, comprising the 
further step of: 

selecting said ceding mode according to the 
minimum of respective sums of said SAT*s with 
corresponding bias terms which account for the 
number of required motion vectors or the respective 
coding modes . 

i 

\ 

14. The method cf claim 12 or 13, wherein: 
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©r-ward, field IS d£ t GlTTTVi Hed aCCOrdinQ tlO B. SUHl 

of:' (a) a sum of absolute differences for the top 
field of zhe. current macroblock relative to a top 
field of -he past reference macroblock, and (b) a 
sum of absolute differences for the bottom field of 
the current macroblock relative to a bottom field of 
the past reference macroblock. 

15. The method of one of claims 12 to 14, 
wherein : 

SAD bacfcMaid , field is determined according to a sum 
of: (a) a sum of absolute differences for the top 
field of the current macroblock relative to a top 
field of the future reference macroblock, and (b) a 
sum of absolute differences for the bottom field of 
the current macroblock relative to a bottom field of 
the future reference macroblock. 

16. The method of one of claims 12 to 15 , 
wherein: 

SAD average> fleld is determined according to a sum 
of: (a) a sum of absolute differences for the top 
field of the current macroblock relative to an 
average of the top fields of the past and future 
reference rnacroblocks , and .(b) a sum of absolute 
differences for the bottom field of the current 
macroblock relative to an average of the bottom 
fields of the past and future reference rnacroblocks. 
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17. A decoder for recovering a current , direct 
mode, field coded macroblock having top and bottom 
fields in a sequence of digital video macroblocks 
from a received bitstream, wherein said current 
macroblock is bi-directionally predicted using a 
past; field coded reference macroblock having top and 
bottom fields, and a future field coded reference 
macroblock having top and bottom fields, comprising: 

means for recovering MV cop/ a forward motion 
vector of the top field of the future macroblock 
which references one of the top and bottom fields of 
the past macroblock, and MV^, a forward motion 
vector of the bottom field of the future macroblock 
which references one of the top and bottom fields of 
said past macroblock; and 

means for determining forward and backward 
motion vectors for predicting at least one of the 
top and bottom fields of the current macroblock by- 
scaling the forward motion vector of the 
corresponding field of the future macroblock. 

18. The decoder of claim 17, further 
comprising : 

means for determining MV r zo?t the forward Tnotior. 
vector for predicting the top field of the current 
macroblock, according to the expression 

. top +■ MV D ; 

where TR Bitop corresponds to a temporal spacing 
between the zop field of the current macroblock and 
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the field of the past macroblock which is referenced 



between the top field of the future macroblock and 
the field of the past macroblock which is referenced 
by MVtop' and is a delta motion vector. 

19. The decoder of claim 18, wherein: 

MVf.top is determined using integer division with 
truncation toward zero; and 

MV top and MV^ are integer half-luma pel motion 
vectors . 

20. The decoder of claim 18 or 19, wherein: 
TR Bttop and TR Ditop incorporate a temporal 

correction which accounts for whether said current 
field coded image is top field first or bottom field 
first. 

21. The decoder of one of claims 17 to 20, 
further comprising: 

means for determining MV ffbot , the forward motion 
vector for predicting the bottom field of the 
current macroblock, according to the expression 
(MV boc *TR 9ibot )/TR D(boc+ MV D ; 

where TR Bboc corresponds to a temporal spacing 
between the bottom field of the current macroblock 
and the field of the past macroblock which is 
referenced by MV^, TR Djboc corresponds to a temporal 
spacing between the bottom field of the future 



by MV, 



cop ' 



TR, 



corresponds to a temporal spacing 
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macroblock and the field of the past macroblock 
which is referenced by MV bec/ and MV D is a delta 
motion vector. 

22. The decoder of claim 21 , wherein: 

MV f#bot is determined using integer division with 
truncation toward zero; and 

MV. op and MV^ are integer half-luma pel motion 
vectors . 

23. The decoder of claim 21 or 22, wherein: 
TR B#bot and TR D#bot incorporate a temporal 

correction which accounts for whether said current 
field coded image is top field first or bottom field 
first . 

24. The decoder of one of claims 17 to 23, 
further comprising: 

means for determining MV to> . op , the backward 
motion vector for predicting the top field of the 
current macroblock, according to one of the 
equations (a) MV b to? = ( (TR B< Lop -TR D ^^) *MV top ) /TP. 0>cop and 
(r>) MV b . cop = MV E , top - MV cop ; 

where TR Bt:5p corresponds to a temporal spacing 
between the top field of the current macroblock and 
the field of the past macroblock which is referenced 
by MV top , TR 2tto? corresponds to a temporal spacing 
between the top field of the future macroblock and 
the field of the past macroblock which is referenced 
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by MV to?( and MV £ , top is rhe forward motion vector for 
predicting the top field of the current macroblock. 

25. The decoder of claim 24, further 
comprising : 

means fcr selecting said equation (a) when a 
delta motion vector MV^O ; and 

means for selecting said equation (b) when 

26. The decoder of one of claims 17 to 24, 
further comprising: 

means for determining MV btaot , the backward 
motion vector for predicting the bottom field of the 
current macroblock, according to one of the 
equations (a) MV btiot = ( (TR Biboc -TR 3(bot } *MV bot ) /TR D#bot and 
lb) MV b ^ oc = MV\ boc - MV fcot ; 

where TR^ boc corresponds to a temporal spacing 
between the bottom field of the current macroblock 
and the field of the past macroblock which is 
referenced by MV bcc , TP. obot corresponds to a temporal 
spacing between the bottom field of the future 
macroblock and the field cf the past macroblock 
which is referenced by MV bot , and MV. #boc is the 
forward motion vector ' for predicting the bottom 
field of the current nacrcblock. 
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27. The decoder of claim 2o, further 
comprising: 

rr.eans for selecting said equation (a) when a 
delta motion vector MV D = 0 ; and 

means for selecting said equation (b) when 
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3. Detailed Description of Invention 
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BACKGROUND OF TH2 INVENTION 

The present invention provides a method and 
apparatus for coding of digital video images such as 
bi-directionally predicted video object planes (B- 
VCPs) , in particular, where the B-VCP and/or a 
reference image used to code the B-VOP is interlaced 
coded. 

The invention is particularly suitable fcr use 
with various multimedia applications, and is 
compatible with the MPEG-4 Verification Model (VM) 
8.0 standard (MPEG-4 VM 8,0) described in document 
ISO/IEC/JTC1/SC29/WGI1 NL796, entitled "MPEG-4 Video 
1^ Verification Model Version 8.0", Stockholm, July 

1597, incorporated herein by reference. The MPEG-2 
standard is a precursor to the MPEG-4 standard, and 
is described in document ISO/IEC 13818-2, entitled 
"Information Technology - Generic Coding of Moving 
20 Pictures and Associated Audio, Recommendation 

H.262," March 25, 1994, incorporated herein by 
reference . 

MPEG-4 is a coding standard which provides a 
flexible framework and an open set of coding tools 
for communication, access, and manipula- ion of 
digital audio-visual data. These tools support a 
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wide range of features. The flexible framework of 
MPEG- 4 supports various combinations of coding tools 
and their corresponding functionalities for 
applications required by the computer, 
5 telecommunication, and entertainment (i.e., TV and 

film) industries, such as database browsing, 
information retrieval, and interactive 
communications . 

MPEG- 4 provides standardized core technologies 
10 allowing efficient storage, transmission and 

manipulation of video data in multimedia 
environments. MPEG- 4 achieves efficient 
compression, object scalability, spatial and 

temporal scalability, and error resilience. ' 
15 The MPEG-4 vidso V>I coder/decoder {codec) is a 

block- and object-based hybrid coder with motion 
compensation. Texture is encoded with an 8x3 
Discrete Cosine Transformation (DCT) utilizing 
overlapped block-motion compensation. Object shapes 

2 0 are represent sd as alpha maps and encoded using a 

Content -based Arithmetic Encoding (CAE) algorithm or 
a modified DCT coder, both using temporal 
prediction. The coder can handle sprites as they 
are known from computer graphics. Other coding 
25 methods, such as wavelet and sprite coding, may also 

be used for special applications. 

Motion compensated texture coding is a well 
known approach fcr video coding, and can be modeled 
as a three-stage process. The first stage is signal / 

3 0 processing which includes motion estimation and 

compensation (M3/MC) and a two-dimensional (2-D) 
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spatial transformation. The objective of ME/MC and 
the spatial transformation is to take advantage of 
temporal and spatial correlations in a video 
sequence to optimize the rate-distortion performance 
5 of quantization and entropy coding under a 

complexity- constraint . The most common technique 
for ME/MC has been block matching, and the most 
common spatial transformation has been the DCT. 

However, special concerns arise for ME/MC of 

1C raacroblocks (MBs) in B-vOPs when the MB is itself 

interlaced coded and/or uses reference images which 
are interlaced coded. 

In particular, it would be desirable to have an 
efficient technique for providing motion vector (MV) 

15 predictors for a 'MB in a B-VCP. It would also be 

desirable to have an efficient technique for direct 
mode coding of a field coded ME in a B-VCP. It 
would further be desirable zc have a coding mode 
decision process for a MB in a field coded B-VOP for 

20 selecting the reference image which is results in 

the most efficient coding. 

The present invention provides a system having 
the above and other ad van t age s . 
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SUMMARY OF THE INVENTION 

In accordance with the present invention, a 
method and apparatus are presented for coding of 
digital video images such as a current image (e.g. f 
5 macroblock) in a bi-directionally predicted video 

object plane (B-VOP) , in particular, where the 
current image and/or a reference image used to code 
the current image is interlaced (e.g., field) coded. 
In. a first aspect of the invention, a .method 

10 provides direct mode motion vectors (MVs) for a 

current bi-directionally predicted, field codec image 
such as a macroblock (MB) having top and bottom 
fields, in a sequence of digital video images. A 
past field coded reference image having top and 

15 bottom fields , and a future field, coded reference 

image having top and bottom fields are determined. 
The future image is predicted using the past image 
such that MV ccp , a forward MV of the top field of the 

future image, references either the top or bottom 
20 field of said past image. The field which is 

referenced contains a best-match MB for a MB in the 

top field of the future image. 

This MV is termed a "forward" MV since, although 

it references a past image (e.g., backward in time), 
2 5 the prediction is from the past image to the future 

image, e.g., forward in time. As a mnemonic, the 

prediction direction may be thought of as being 

opposite the direction of the corresponding MV. 
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Similarly, MV bot , a forward motion vector of the 

bottom field of the future image, references either 
the top or bottom field of the past image. Forward 
and backward. KVs are determined for predicting the 
5 top and/or bottom fields of the current image by- 

scaling the forward MV of the corresponding field of 
the future image . 

In particular, MV f top , the forward motion vector 

for predicting the top field of the current image, is 
10 determined according to the expression MVf, top = \ MV tcp 

*TR 3top ) /TR^ tc? + MV D , where MV D is a delta motion 

vector for a search area, TR 3#top corresponds to a 

temporal spacing between the top field of the current 
image and the field of the past image which is 
15 referenced by MV top , and TR^ top corresponds to a 

temporal spacing between the top field of the future 
image and the field of the past image which is 
referenced by MV. op . The temporal spacing may be 

related to a frame rate at which the images are 
2 0 displayed. 

Similarly, ■ MV f <teot , the forward motion vector for 
predicting the bottom field of the current image, is 
determined according to the expression MV £bot = (MV^ 

*TR 3 boc ) /TR D bot + mv 0 , where MV D is a delta motion 
25 vector, TR B hot corresponds to a temporal spacing 

between the bottom field of the current image and the 
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field, of Che past image which is referenced by MV bot , 
and TR D bct corresponds to a tempcral spacing- between 
the bottoni field of the future MB and the field of 
the past MB which is referenced by MV boc - 

MV b(topi the backward motion vector for 
predicting the top field of the current MB is 

determined according to the equation MVV top = j 

{ (TR Bitop -TR Cttop )*MV. OF )/ra D|tS)p when the delta motion 

vector MV n =0, or MV b/fcop = MV F|t3F - MV top when mv d *o. 

MV b boc , the backward motion vector for 

predicting the bottom field of the current MB is 
determined according to the equation MV b>bot = 

< C TR B,boc- TR u,boL> * mv jdo C ) / ra a jto o= when the delta motion 

vector mv d =o, or w b<bot = mv £ - ^ v buL wi-sn mv d *o. 

A corresponding decoder is also presented. 

in another aspect of the invention, a method is 
presented for selecting a coding node for a current 
predicted, field coded MB having top and bottom 

fields, in a sequence of digital video MB 5 . The ( 
coding node may be a backward mcde, where the 
reference MB is temporally after the current M3 in 
display order, a forward node, where the reference MB 
is before the current MB, or average (e.g., bi- 
directional) mode, where an average of prior and 
subsequent reference MBs is used. 
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The method includes the step of determining a 
forward sum of absolute differences error, 
S AD forward# f ield for the current MB relative to a past 
reference MB, which corresponds to a forward coding 
5 mode. SAD forward< f ield indicates the error in pixel 

luminance values between che current MB and a best 
match MB in the past reference MB. A backward sum of 
absolute differences error, SAD bac:<ward f isid for the 

current MB relative to a future reference MB, which 
10 corresponds to. a backward coding mode is also 

determined. SAD bacScward ^ f ield indicates the error in 

pixel luminance values between the current MB and a 
best match MB in the future . reference MB. 

An average sum of absolute differences error, 
15 SAD averagCf field for the current MB relative to an 

average of the past and future reference MBs , which 
corresponds to an average coding mode, is also 
determined. SAD average< field indicates the error in 

pixel luminance values between the current MB and a 
20 MB which is the average of the best match MBs of the 

past and future reference MBs . 

The coding mode is selected according to the 

minimum of the SADs . Bias terms which account for 

the number of required MVs of the respective coding 
25 modes may also be factored into the coding mode 

selection process . 
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SAD forvard, field' SAD backward, field ' an< ^ SAtD average, Held 

are determined by summing the component terms over 
the top and bottom fields. 
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DETAILED DESCRIPTION OF TEE INVENTION 

A method and apparatus are presented for coding 
of a digital video image such as a macroblock (MB) 
in a bi-directionally predicted video object plane 
5 (B-VOP) , in particular, where the MB and/or a 

reference image .used tc code che MB is interlaced 
coded. The scheme provides a method for selecting a 
prediction motion vector (PMV) for the top and 
bottom field of a field coded current MB, including 

10 forward and backv/ard PMVs as required, as well as 

for frame coded MBs . A direct coding mode for a 
field coded MB is also presented, in addition to a 
coding decision process which uses the mini main of 
sum of absolute differences terms to select an 

15 optimum mode. 

FIG. 1 is an illustration of a video object 
plane (VOP) coding and decoding process in 
accordance with the present invention. Frame 1C5 
includes three pictorial elements, including a 

20 square foreground element 107, an oblong foreground 

element 108, and a landscape backdrop element 109- 
In frame 115, -he elements are designated VOPs using 
a segmentation mask such that VOP 11? represents the 
square foreground element 107, VOP 118 represents 

25 the oblong foreground element 108, and VOP 119 

represents the landscape backdrop element 109. A 
VOP can have an arbitrary shape, and a succession of 
VOPs is known as a video object. A. full rectangular 
video frame ray also be considered to be a VOP . 
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Thus,, the term "VOP" will be used herein to indicate 
both arbitrary and non-arbitrary (e.g., rectangular) 
-image area shapes. A segmentation mask is obtained 
using known techniques, and has a format similar to 
5 that of ITU-R 601 luminance daza . Each pixel is 

identified as belonging to a certain region in the 
video frame. 

The frame 105 and VOP data from frame 115 are 
supplied to separate encoding functions. In 

10 particular, VOPs 117, 118 and 112 undergo shape, 

motion and texture encoding at encoders 137, 13 S and 
139, respectively. With shape coding, binary and 
gray scale shape information is encoded. With 
motion coding, the shape information is coded using 

15 motion estimation within a frame. With texture 

coding, a spatial transformation such as the DCT is 
performed to obtain transform coefficients which can 
be variable-length coded for compression. 

The coded VOP data is then combined at a 

2 0 multiplexer (MUX) 14 0 for transmission over a 

channel 145 . Alternatively, the data may be stored 
on a recording medium. The received coded VOP data 
is separated by a demultiplexer (DEMUX) 150 so that 
the separate VOPs 117-119 are decoded and recovered. 
25 Frames 155. 165 and 175 show that VOPs 117, 118 and 

119, respectively, have been decoded and recovered 
and can therefore be individually manipulated using 
a compositor 160 which interfaces with a video 
library 17 0, for example. 

3 0 The compositor may be a device such as a 

personal computer which is located at a user's home 
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to allow the user to edit the received data to 
provide a customized image . For example, the user's 
personal video library 17 0 may include a previously 
stored VOP 178 (e.g., a circle) which is different 
5 than the received VOPs . The user may compose a 

frame 185 where the circular VOP 178 replaces the 
square VOP 117. The frame 185 thus includes the 
received VOPs 118 and 119 and the locally stored VOP 
178 . 

10 In another example, the background VOP 109 may 

be replaced by a background of the user's choosing. 
For example, when viewing a television news 
broadcast, the announcer may be coded as a VOP which 
is separate from the background, such as a news 

15 studio. The user may select a background from the 

library 170 or from another television program, such 

as a channel with stock price or weather 

inf ormation. The user can therefore act as a video 

editor. 

20 The video library 170 may also store VOPs which 

are received via the channel 145, and may access 
VOPs and other image elements via a network such as 
the Internet. Generally, a video session comprises 
a single VOP, or a sequence of VOPs. 

25 The video object coding and decoding process of 

FIG. 1 enables many entertainment, business and 
educational applications, including personal 
computer games, virtual environments, graphical user 
interfaces # videoconferencing, Internet applications 

30 and the like. In particular, the capability for 

ME/MC with interlaced' coded (e.g., field mode) VOPs 
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in accordance with the present invention provides 
even greater capabilities. 

?IG. 2 is a block diagram of an encoder in 
accordance with the present invention. The encoder 
5 is suitable for use with both predictive- coded VOPs 

(P-VOPs) and bi-directionally coded VOPs (B-VOPs) . 

P-VOPs may include a number of macroblocks 
(MBs) which may be coded individually using an 
intra- frame mode or an inter- frame mode. With 

10 intra- frame (INTRA) coding, the macroblock (MB) is 

coded without reference to another MB . With inter- 
frarae (INTER) coding, the MB is differentially coded 
with respect to a temporally subsequent frame in a 
mode known as forward prediction. The temporally 

15 subsequent frame is known as an anchor frame or 

reference frame. The anchor frame (e.g., VOP) must 
be a P-VOF or an I- VOP, not a B-VCP . An I-VOP 
includes self-contained {e.g., intra-coded) blocks 
which are not predictive coded. 

20 With forward prediction, the current . MB is 

compared to a search area of. MBs in the anchor frame 
to determine the best match . A corresponding motion 
vector (MV) , known as a backward MV, describes the 
displacement of "the current MB relative to- the best 

25 match MB. Additionally, an advanced prediction mode 

for P-voPs may be used, where motion compensation is 
performed on 8x8 blocks rather than 16x16 MBs . 
Moreover, both intra-frame and inter-frame coded P- 
VOP MBs can be coded in a frame mode or a field 

3 0 mode . 
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. B-VOPs can use the forward prediction mode as 
described above in connection with P-VOPs as well as 
backward prediction, bi-directional prediction, and 
direct mode, which are all inter-frame techniques. 
5 B-VOPs do not currently use intra- frame coded MBs 

under MPEG-4 VM 8.0, although this is subject to 
change. The anchor frame (e.g., VOP) must be a P- 
VOP or I -VOP, not a B-VOP. 

With backward prediction of B-VOPs, the current 

10 MB is compared to a search area of MBs in a 

temporally previous anchor frame to determine the 
best match. A corresponding MV, known as a forward 
MV) , describes the relative displacement of the 
current MB relative to the best match MB. With bi- 

15 directional prediction of a B-VOP MB, the current MB 

is compared to a search area of MBs in both a 
temporally previous anchor frame and a temporally 
subsequent anchor frame to determine the best match 
MBs. Forward and backward MVs describe the 

20 displacement of the current MB relative to the best 

match MBs. Additionally, an averaged image is 
obtained from the best match MBs for use in encoding 
the current MB . 

With direct mode prediction of B-VOPs, a MV is 

25 derived for an 8x8' block when the collocated MB in 

the following P-VOP uses the 8x8 advanced prediction 
mode. The MV of the 8x8 block in the P-VOP is 
linearly scaled to derive a MV for the block in the 
B-VOP without the need for searching to find a best 

30 match block . 
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The encoder, shown generally at 200, includes a 
shape coder 210, a motion estimation function 220, a 
motion compensation function 230, and a texture 
coder 240, which each receive video pixel data input 
5 at terminal 205. The motion estimation function 

220, motion compensation function 230, texture coder 
240, and shape coder 210 also receive VOP shape 
information input at terminal 207, such as the MPEG- 
4 parameter VOP_of __arbitrary__shape . When this 

10 m parameter is zero, the VOP has a rectangular shape, 
and the shape coder 210 therefore is not used. 

A reconstructed anchor VOP function 250 
provides a reconstructed anchor VOP for use by the 
motion estimation function 220 and motion 

15 compensation function 230. A current VOP is 

subtracted from a motion compensated previous VOP at 
subtractor 260 to provide a residue which is encoded 
at the texture coder 240. The texture coder 240 
performs the DCT to provide texture information 

20 (e.g., transform coefficients) to a multiplexer 

(MUX) 280. The texture coder 240 also provides 
information which is summed with the output from the 
motion compensator 2 30 at a summer 2 70 for input to 
the previous reconstructed VOP function 250. 

25 Motion information (e.g., motion vectors) is 

provided from the motion estimation function 220 to 
the MUX 280, while shape information which indicates 
the shape of the VOP is provided from the shape 
coding function 210 to the MUX 280. The MUX 280 

30 provides a corresponding multiplexed data stream to 
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a buffer 290 for subsequent communication over a 
daza channel. 

The pixel data which is input to the encoder 
may have a YUV 4:2:0 format. The VOP is represented 
5 by means of a bounding rectangle. The top left 

coordinate of the bounding rectangle is rounded to 
the nearest even number not greater -han the top 
left coordinates of the tightest rectangle. 
Accordingly, the top. left coordinate of the bounding 

10 rectangle in the chrominance component is one -half 

that of the luminance component . 

FIG. 3 illustrates an interpolation scheme for 
a half-pixel search. Motion estimation and motion 
compensation (ME/MC) generally involve matching a 

15 block of a current video frame (e.g., a current 

block) with a block in a search area of a reference 
frame (e.g., a predicted block or reference block) . 
For predictive (P) coded images, the reference block 
is in a previous frame. For bi -cirsctionally 

20 predicted (B) coded images, predicted blocks in 

previous and subsequent frames may be used. The 
displacement of the predicted block relative to the 
current block is the motion vector (MV) , which has 
horizontal (x) and vertical (y) components. 

25 Positive values of -he MV components indicate that 

the predicted block is to the right of, and below, 
the current block. 

A motion compensated difference block is formed 
by subtracting the pixel values of the predicted 

30 block from those of the current block point by 

point . Texture coding is then performed on the 
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difference block. The coded MV and the coded 
texture information of the difference block are 
. transmitted to the decoder. The decoder can then 
reconstruct ar_ approximated current block by adding 
5 the quantized difference block to the predicted 

block according to the MV. The olock for ME/MC can 
be a 16x16 frame block (macroblock) , an 8x3 block, or 
a 16x8 field block. 

Accuracy of the MV is set at half -pixel. 

10 Interpolation tttud t be used on the anchor frame so ( 

that p(f + xj+y) is defined for x or y being half of an 
integer. Interpolation is performed as shown in 
FIG . 3. Integer pixel positions are represented by 
the symbol " + " , as shown at A, B, C and D. Halr- 

15 pixel positions are indicated by circles, as showr. 

at a, b f c and d. As seen, a - A, b « (A B) //2 
c - (A + C)//2, and d = (A + B +- C + D)//4, vhers 
"//" denotes rounded division. Further details of 
the interpolation are discussed in MPEG-4 vm 8 . 0 

20 referred to previously as well as commonly assigned 

CJ.S. Patent application Serial No. 03/897,847 to 
Eifrig et al . , filed July 21, 1997, entitled "Motion 
Estimation and Compensation of Video Object Planes 
for Interlaced Digital video", incorporated herein 

25 by reference. 

FIG. S illustrates reordering of pixel lines ir_ 
an adaptive frame/field prediction scheme in 
accordance with :he present invention. In a first 
aspect of the advanced prediction technique, an 

3 0 adaptive technique is used to decide whether a 
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current macroblock (MB) of 16x16 pixels should be 
ME/MC coded as is, or divided into four blocks of 
8x8 pixels each, where each 8x8 block is ME/MC coded 
separately, or whether field based motion estimation 
5- should be used, where pixel lines of the MB are 

reordered to group the same-field lines in two 16x8 
field blocks, and each 16x8 block is separately 
ME/MC coded. 

A field mode 16x16 macroblock (MB) , is shown 

10 generally at 600. The MB includes even-numbered 

lines 602, 604, 606, 608, 610, 612, 614 and 616, and 
odd-numbered lines 603 , 605 , 607, 609, 611 , 613, 615 
and 617. The even and odd lines are thus 
interleaved, and form top and bottom (or first and 

15 second) fields, respectively. 

When the pixel lines in image 60 0 are permuted 
to form same-field luminance blocks, the MB shown 
generally at 65 0 is formed. Arrows, shown generally 
at 645, indicate the reordering of the lines 602- 

20 617. For example, the even line 602, which is the 

first line of MB 600, is also the first line of MB 
650. The even line 604 is reordered as the second 
line in MB 650. Similarly, the even lines 606, 608, 
610, 612, 614 and 616 are reordered as the third 

25 through eighth lines, respectively, of MB 650. 

Thus, a 16x8 luminance region 680 with even-numbered 
lines is formed. Similarly, the odd-numbered lines 
603, 605, 607, 609, 611, 613, 615 and 617 form a 
16x8 region 685 . 

30 The decision process for choosing the MC mode 

for P-VOPs is. as follows. For frame mode video, 
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t irsc. obtain the Sum of Absolute Differences (SAD) 
for a single 16x15 ME, e.g., SAD ^ 6 ( .W x , MV, ); and for 

four BxB blocks, e.g., 

SAD t (MV ry , MV yx ), SAD t (MV zl , MV /2 ), SAD t fMV x2 , MV yi ) % and SAD % ( MV xA , tAV^ / 

-I 

5 If Y.^%<W*.W/)<SAD^(hW £ ,hW , choose 8x6 

prediction; otherwise, choose 16x16 prediction. The 
constant: "129" is ohrained from Nb/2+l, where Nh is 
the number of- non- transparent pixels in a MB. 
For interlaced video, obtain 
10 ^-•CA^^ i W^) I .tt/^<W,^ f i^^) i where (.V^.M^) 

and (>^; >m ,^ 6ote( ) are the motion vector (MV) for 
both top (even) and bottom (odd) fields. Then, 
choose the reference field which has the smallest 
SAD (e.g., for £AD top and SAD btth ,.^> from the field 

• 15 half sample search. 

The overall prediction mode decision is based 
on choosing the minimum of: 

4 

{ a ) SAD i6 ( MV X , MV y ) , (b) T.SAD^MV^ . MV^ / H 29 , 

and (c) SAD^MV^.MV, J ^)+SAD tMm (Mtr,^ m .MV, + . 

20 If term (a) is the minimum, 16x16 prediction is 

used. If term (b) is "he minimum, 8x8 motion 
compensation (advanced prediction mode) is used. If 
term (c) is che minimum, field based motion 
estimation is used. The constant "65" is obtained 

25 from Mb/4+1. 

If SxS prediction is chosen, there are four MVs 
for the four 8x8 luminance blocks, i.e., one MV for 
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each 8x8 block. The MV for the two chrominance 
blocks is then obtained by taking an average of 
these four MVs and dividing the average value by 
two. Since each MV for the 8x8 luminance block has 
5 a half -pixel accuracy, the MV for the chrominance 

blocks may have a sixteenth pixel value. Table 1, 
below, specifies the conversion of a sixteenth pixel 
value to a half -pixel value for chrominance MVs. 
For example, 0 through 2/16 are rounded to 0, 3/16 
10 through 13/16* are rounded to 1/2 , and 14/16 and 

15/16 are rounded to 2/2=1. 



Table 1 



1/16 

pixel 

value 


0 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


12 


13 


14 


15 


1/2 
pixel 
value • 


0 


0 


0 


1 


1 


1 


1 


1 


1 


1 


• 1 


1 


1 


1 


2 


2 



With field prediction, there are two MVs for 
the two 16x8 blocks. The luminance prediction is 

15 generated as follows. The even lines of the MB 

(e.g./ lines 602, 604, 606, 608, 610, 612, 614 and 
616) are defined by the top field MV using the 
reference field * specif ied . The MV is specified in 
frame coordinates such Lhat full pixel vertical 

20 displacements correspond to even integral values of 

the vertical MV coordinate, and a half -pixel 
vertical displacement is denoted by add integral 
values. When a half -pixel vertical offset is 
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specified, only pixels from lines within the same 
reference field are combined. 

The MV for the two chrominance blocks is 
derived from the (luminance) MV by dividing each 
5 component by two, then rounding. The horizontal 

component is rounded by mapping all fractional 
values into a half -pixel offset. The vertical MV 
component is an integer and the resulting 
chrominance MV vertical component is rounded to an 
10 integer. If "the result of dividing by two yields a 

non-integral value, it is rounded to the adjacent 
odd integer. Note that the odd integral values ( 
denote vertical interpolation between lines of the 
same field. 

15 The second aspect of the advanced prediction 

technique is overlapped MC for luminance blocks, 
discussed in greater detail in MPEG-4 VM 8.0 and 
Eifrig et al . application referred to previously. 

Specific coding techniques for B-VOPs are now 

20 discussed. For INTER coded VOPs such as B-VOPs , 

there are four prediction modes, namely, direct 
mode, interpolate {e.g., averaged or bi-directional) 
mode, backward mode, and forward mode. The latter 
three modes are . non-direct modes. Forward only, or. 

2 5 backward only prediction are also known as 

"unidirectional" prediction. The predicted blocks ^ 
of the B-VOP are determined differently for each 
mode. Furthermore, blocks of a B-VOP and the anchor 
block (s) may be progressive (e.g., frame) coded or 

30 interlaced (e.g., field) coded. 
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A single B-VOP can have different MBs which are 
predicted with different modes. The term "B-VOP" 
only indicates that bi-directionally predicted 
blocks may be included, but this is no- required. 
5 In contrast, with P-VOPs and I-VOPs, bi- 

directionally predicted MBs are not used. 

For non-direct mode B-VOP MBs, MVs are ceded 
differentially. For forward MVs in forward and bi- 
directional modes, and backward MVs in backward and 

10 bi-direction<£i modes/ the "same-type" MV (e.g., 

forward or backward) of the MB which immediately 
precedes the current MB in the sane row is used as a 
predictor. This is the same as the immediately 
preceding MB in raster order, and generally, in 

15 transmission order. However, if the raster order 

differs from the transmission order, the MVs of che 
immediately preceding MB in transmission order 
should be used to avoid the need to store and re- 
order the MBs and corresponding MVs at the decoder. 

20 Using the same-type MV, and assuming the 

transmission order is' the same as the raster order, 
and that the raster order is from left to right, top 
to bottom, the forward MV of the left-neighboring MB 
is used as a predictor for the forward MV of the 

25 current MB of the B-VOP. Similarly, the backward MV 

of the left -neighboring MB is used as a predictor 
for the backward MV of the current MB of the B-V0P. 
The MVs of the current MB are then differentially 
encoded using the predictors. That is, the 

3 0 difference between the predictor and the MV which is 

determined for the curren: ME is transmitted as a 
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motion vector difference to a decoder. At the 
decoder, the MV .of the current MB is determined by- 
recovering and adding the PMV and the difference MV. 

In case the current MB is located on the left 
edge of the VOP, the predictor for the current MB is 
set to zero . 

For interlaced-coded B-VOPs, each of the top 
and bottom fields have two associated prediction 
motion vectors, for a total of four MVs . 
The four prediction MVs represent, in transmission 
order, the top field forward and bottom field 
forward of the previous anchor MB, and the top field 
backward and bottom field backward of the next 
anchor MB. The current MB and the forward MB, 
and/or the current MB and the backward MB, may be 
separated by one or more intermediate images which 
are not used for MB/MC coding of the current MB. B- 
VOPs do not contain INTRA coded MBs , so each MB in 
the B-VOP will be ME/MC coded. The forward and 
backward anchor MBs may be from a P-VOP or I -VOP, 
and may be frame or field coded. 

For interlaced, non-direct mode B-VOP MBs, four 
possible prediction motion vectors (PMVs) are shown 
in Table 2 below. The first column of Table 2 shows 
the prediction function, while the second column 
shows a designator for the PMV . These PMVs are used 
as shown in Table 3 below for the different MB 
prediction modes. 
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Table 2 



Prediction function 


PMV type 


Top field, forward 


c 


Bottom field, forward 


l 


Top field, backward 


2 


Eottom field, backward 


3 



Table 3 



Macroblock mode 


PMV type used 


Frame, « forward 


0 


Frame , backward 


2 


Frame , bi-directional 


0,2 


Field; forward 


0,1 


Field, backward 


2,3 


Field, bi-directional 


0,1, 2,3 



For example, Table 3 shows that, for a current 
field mode MB with a forward prediction mode (e.g., 
5 "Field, forward") , top field forward ("0") and 

bottom field forward ("l" ) motion vector predictors 
are used. 

After being used in differential coding, the 
motion vectors of a current MB become the ?MVs for a 

10 subsequent MB, in transmission order. The £MVs are 

reset to zero at the beginning of each row cf MB s 
since the MVs of a MB at the end of a preceding row 
are unlikely to be similar to the MVs of a MB at the 
beginning of a current row. The predictors are also 

15 not used for direct mode MBs . For skipped MBs, the 

PMVs retain the last value . 

With direct mode coding of B-VOP MBs, no vector 
differences are transmitted. Instead, the forward 
and backward MVs are directly computed at the 




1 



decoder from the MVs of the temporally next P-VOP 
MB, with correction by a single delta MV, which is 
not predicted. The technique is efficient since 
less MV data is transmitted. 

Table 4 below summarizes which PMVs are used to 
code the motion vectors of the current B-VOP MB 
based on the previous and current MB types . For B- 
VOPs, an array of prediction motion vectors, pmvt 1 
may be provided which are indexed from zero to three 
(e.g., pmv[0]\ pmv[l], pmv[2] and pmv[3]). The 
indexes pmv[ ] are not transmitted r but the decoder 
can determine the pw[ ] index to use according to 
the MV coding type and the particular vector being 
decoded. After coding a B-VOP MB, some of the PMVs 
vectors are updated to be the same as the motion 
vectors of the current MB. Thus first one, two or 
four PMVs aire updated depending on the number of MVs 
associated with the current MB. 

For example, a forward, field predicted MB has 
two motion vectors, where pmv[0] is the PMV for the 
top field, forward, and pmv[i] is the PMV for the 
bottom field, forward. For a backward, field 
predicted MB, pmv[2] is the PMV for the top field 
backward, and pmv [ 3 ] is the PMV for the bottom 
field, backward.. For a bi-directional, field 
predicted MB, pmv[0] is the PMV for the top field, 
forward, prnvfi] is the PMV for the bottom field 
forward, pmv [2] is the PMV for the top field 
backward, and pmv [3] is the PMV for zhe bottom field 
backward. For a forward or backward predicted frame 
mode B-VOP MB, there is only one MV, so only pmv[0] 
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is used for forward, and pmv[2] is used for 
backward. For an average (e.g., bi-directionally) 
predicted frame mode B-VOP MB, there are two MVs, 
namely, pmv[0] for the forward MV, and pmv[2] for 
5 the backward MV . The row designated "pmv[ ] • s to 

update" indicates whether one, two or four MVs are 
updated. 
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It will tie appreciated that Table 4 is merely a 
shorthand notation for implementing the technique of 
the present invention for selecting a prediction MV 
5 for a current MB. However, :he scheme may be 

expressed in various other ways. 

Intra block DC adaptive predicticn can use the 
same algorithm as described in MPEG-4 VM 8.0 
regardless of value of dct_type. Intra block 

10 adaptive AC prediction is performed as described in 

MP EG - 4 VM 8.0 except when ~he first rcw of 
coefficients is to be ccpied from the ceded block 
above. This operation is allowed only if dct_type 
has the same value for the current MB and the block 

15 above. If the dct_types differ, then AC prediction 

can occur only by copying che first column from the 
block to the left. If there is no left block, zero 
is used for zhe AC predictors. 

FIG. 4 illustrates direct mode ceding of the 

20 top field of an inzerlaced-coded B-VOP in accordance 

with the present invention. Progressive direct 
coding mode is used for the current macroblock (MB) 
whenever the "MB in a future anchor picture which is 
at the same relative position (e.g., co-sited) as 

25 the current MB is coded as (1) a 16x16 (frame) MB, 

(2) an intra ME or (3) an 3x3 (advanced prediction) 
MB. 

The direct mode prediction is interlaced 
whenever the co-sited future anchor picture MB is 
3D coded as an interlaced MB. Direct mode will ba used 

to code the current MB if its biased SAD is the 
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minimum of all B-VOP MB predictors. Direct mode for 
an interlaced coded MB forms the prediction MB 
separately for the top and bottom fields of the 
current MB. The four field motion vectors (MVs) of 
5 a bi-directional field motion compensated MB (e.g., 

top field forward, bottom field forward, top field 
backward, and bottom field backward) are calculated 
directly from the respective MVs of the 
corresponding MB of the future anchor picture. 

10 The technique is efficient since the required 

searching is significantly reduced, and the amount 
of transmitted MV data is reduced. Once the MVs and 
reference field are determined, the current MB is 
considered to be a bi-directional field predicted ( 

15- MB. Only one delta motion vector (used for both 
fields) occurs in the bitstream for the field 
predicted MB. 

The prediction for the top field of the current 
MB is based on the top field MV of the MB of the 

20 future anchor picture (which can be a P-VOP, or an 

I-VOP with MV=0) , and a past reference field of a 
previous anchor picture which is selected by the 
corresponding MV of the top field of the future 
anchor MB. That is, the top field MB of the future 

25 anchor picture which is correspondingly positioned 

(e.g., co-sited) to the current MB has a best match 
MB in either the top or bottom field of the past 
anchor picture. This best match MB is then used as 
the anchor MB for the top field of the current MB. 

30 An exhaustive search is used to determine the delta 
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motion vector MV 3 given the co-sited future anchor 
MV on a MB by MB basis. 

Motion vectors for the bottom field of the 
current MB are similarly determined using the MV of 
5 the correspondingly positioned bottom field of the 

future anchor MB, which in turn references a best 
match MB in the top or bottom field of the past 
anchor picture . 

Essentially, the top field motion vector is 
10 used to construct an MB predictor which is the 

average of (a) pixels obtained from the top field of 
the correspondingly positioned future anchor MB and 
(b) pixels from the past anchor field referenced by 
the top field MV of the correspondingly positioned 
15 future anchor MB. Similarly, the bottom field 

motion vector is used to construct a MB predictor 
-which is the average of (a) pixels obtained from the 
bottom field of the correspondingly positioned 
future anchor MB and (b) pixels from the past anchor 
20 field referenced by the bottom field MV of the 

correspondingly positioned future anchor MB. 

As shown in FIG. 4, the current B-VOP MB 42 D 
includes a top field 430 and bottom field 425, the 
past anchor VOP - MB 4 00 includes a top field 410 and 
25 bottom field 405, and the future anchor VOP MB 440 

includes a top field 450 and bottom field 445. 

Tlie motion vector MV top is the forward motion 

vector for the top field 450 of the future anchor MB 
440 which indicates the best match MB in the past 
30 anchor M3 400 , Even though MV rop is referencing a 
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previous image (e.g., backward in time), it is a 
forward MV since the future anchor VOP 440 is 
forward in time relative to the past anchor VOP 400. 
In the example, MV top references the bottom field 

405 cf the past anchor MB 400, although either the 
top 410 or bottom 405 field could be referenced. 
MV^top is the forward MV of the top field of the 
current MB, and MV b<tcp is the backward MV of the top 

field of the current MB. Pixel data is derived for 
the bi-directionally predicted MB at a decoder by- 
averaging the pixel data in the future and past 
anchor images which are identified by MV b top and 

M^top' respectively, and summing the averaged image 
with a residue which was transmitted. 

The motion vectors for the top field are 
calculated as follows: 

M V f.top= (TR B , top *MV top ) /TR Dftop +MV D ; 
MV b , top = (<TR E , top - TR D , tcp ) * MV top )/' TR D , X 
if MV D =0; and 

MV b,tap = (MV fiMp - MV top ) if MV D *0. 
MV D is a delta, or offset, motion vector. Note that 

the motion vectors are two-dimensional. 
Additionally, the motion vectors are integral half- 
pixel luma motion vectors. The slash denotes 
truncate toward zero integer division. Also, the 
future anchor VOP is always a P-V0P for field direct 




#§3¥l 1-75191 



mode. If the future anchor was an I-VOP, the MV 
would be zero and 16x16 progressive direct mode 
would be used. XR Bto? is the temporal distance in 

fields between the past reference field (e.g., top 
5 or bottom) , which is the bottom field 405 in this 

example, and the top field 43 0 of the current B-VOP 
420. TR D top is the temporal distance between the 

pasr reference field (e.g., top or bottom), which is 
the bottom fi-eld 405 in this example, and the future 

10 top reference field 450. 

FIG. 5 illustrates direct mode coding of the 
bottom field of an interlaced- coded B-VOP in 
accordance with the present invention. Note that 
the source interlaced video can have a top field 

15 first or bottom field first format. a bottom field 

first format is shown in FIGs 4 and 5. Like- 
numbered elements are the same as in FIG. 4. Here, 
the motion vector MV fcot is the forward motion vector 

for the bottom field 445 of the future anchor 
2 0 macroblock (MB) 44 0 which indicates the best match 

MB in the past anchor MB 400. In the example, MV^ 

references the bottom field 405 of the past anchor 
MB 400, although either the top 410 or bottom 405 
field could be used. MV £/bot and MV b#bot are the 

25 forward and backward motion vectors, respectively. 

The motion vectors for the bottom field are 
calculated in a parallel manner to the top field 
motion vectors, as follows: 
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MVf. w = (TRs.^^MV.^) /TR D . bot+ MV D ; 

, hot ~ 

((TR B 

.hot " TR D>bot ) * MV bct )/ TRj^ j 

if MV D =0 and 

MV b . fcot = (MV f . bot - HV^) if MV D *0. 
5 TR B,boc ^ s tlie temporal distance between the past 

reference field {e.g., top or bottom), which is the 

bottom field 405 in this example, and the bottom 
■ 

field 425 of the current B-VQP 420. TR Dbot is the 

temporal distance between the past reference field 
10 {e.g., top or bottom), which is the bottom field 405 ( 

in this example, and the future bottom reference 
field 445 . 

Regarding the examples of FIGs 4 and 5, the 
calculation of TR 9>tep , TR 0> te?/ TR B bot and TR D , bo , 

15 depends not only on the current field, reference 

field, and frame temporal references, but also on 
whether the current video is top field first or 
bottom field first. In particular , 

TR D , top or TR Dfbot = 2* (TR futuro - TR^J + 5; and 

20 TR Bjtop or TR B/bot = 2* (TR curraat - TR^J + 5; 

where TR fucure , TR currenc , and TR^^. are the frame 

number of the future, current and past frames, ( 
respectively, in display order, and 5, an additive 
correction to the temporal distance between fields, 
25 is given by Table 5, below. 5 has units of field 

periods . 




WM¥1 1-75191 



For example, the designation "1" in the last 
row of the first column indicates that the future 
anchor field is the top field, and the referenced 
field is the bottom field. This is shown in FIG . 4. 
5 The designation "1" in the last row of the second 

column indicates that the future anchor field is the 
bottom field, and the referenced field is also the 
bottom field. This is shown in FIG. 5. 



Table 5 - Temporal correction, 5 



Referenced Field 


Bottom Field 
First 


Top Field First 


Future 
Anchor - 
top 


Future 

Anchor= 

bottom 


Top 

Field 5 


Bottom 
Field 5 


Top 

Field 5 


Bottom 
Field 5 


top 


top 


0 


-1 


0 


l 


top 


bottom 


0 


0 


0 


0 


bottom 


top 


1 


-1 


-1 


1 


bottom 


bottom 


1 


0 


-1 


0 



10 For efficient coding, an appropriate coding 

mode decision process is required. As indicated, 
for B-VOPs, a KB can be coded using (1) direct 
coding, (2) 16x16 motion compensated (includes 
forward, backward and averaged modes), or (3) field 

15 motion compensation (includes forward, backward and 

averaged modes} . Frame or field direct coding of a 
current MB is used when the corresponding future 
anchor MB is frame or field direct coded, 
respectively . 

20 For a field motion compensated MB in a B-VOPs, 

a decision is made to code the MB in a forward, 
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backward, or averaged mode based on the minimum 
luminance half -pixel SADs with respect to the 
decoded anchor pictures. Specifically, seven biased 
SAD terms are calculated as follows: 

5 

(1) SAD^^+b,, (2) SAD forward +b 2 , (3) SAD bacJcward +b 2 , 

< 4 > SAD avaraq ^b 3 , (5) SAD forvar<3 £ield+ b 3/ . 

(6) SAD backwarfi ^ £leld +b 3 , and (7) BAD 4ritagi|ftild+ b 4( 

10 where the subscripts indicate direct mode, forward 

motion prediction, backward motion prediction, 
average (i.e., interpolated or bi-directional) 
motion prediction, frame mode (i.e., locally 
progressive; and field mode (i.e., locally 

15 interlaced). The field SADs above (i.e., 

SAD forward, fields S ^backward, field ' and SAD aver*ge , field > ar ^ 

the sums of the top and bottom field SADs , each with 
its own reference field and motion vector. 
Specifically, SAD f orward , f ieid = SAD £orware5( tap £ieId + 

20 SAD forward, bottom field' SAD backvard, field = SAD baclcv^rd . -op ciald * 



25 



SAD, 



'backward.bo-.tom field' and SAD average, field " SAD average. top 



field 3AD average, bottom field - 



SAD, 



rect is the best direct mode prediction, 



SAI) £ D rvard is the best 16x16 prediction from the 
forward (past) reference, SAD backward is the best 
16x16 prediction from the backward (future) 
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reference, SAD average is the best 16x16 prediction 
formed by a pixel-by-pixel average of the best 
forward and best backward reference, SAD forward f < eld is 
the best field prediction from the forward (past) 
5 reference, SAD bacJcward f leld is the best field 

prediction from the backward (future) reference, and 
SAD average fisld is the best field prediction formed by 

a pixel -by -pixel average of the best forward and 
best backward reference. 
10 The b L l s are bias values as defined in Table 6, 

below, to account for prediction modes which require 
more motion vectors. Direct mode and modes with 
fewer MVs are favored. 



Table 6 



Mode 


Number of 

motion 

vectors 


b. 


Bias 


Value 


Direct 


l 


b. 


- (Nb/2 + 1) 


-129 


Frame , 
forward 


1 


b a 


0 


0 


Frame, 
backward 


1 


b 2 


0 


0 


Frame, 
average 


2 


b. 


(Nb/4 + 1) 


65 


Field, 
forward 


2 


b 3 


(Nb/4 + l) 


65 


Field, 
backward 


2 


b, 


(Nb/4 + 1) 


65 


Field, 
average 


4 


b 4 


{Nb/2 + 1) 


129 



$1 



# 
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The negative bias fcr direct mode is for consistency 
with the existing MPEG- 4 VM for progressive video, 
and may result in relatively nore skipped MBs. 
FIG. 7 is a block: diagram of a decoder in 
5 accordance with the present invention. The decoder, 

shown generally at 700, can be used to receive and 
decode the encoded data signals transmitted from the 
encoder of FIG . 2. The encoded video image daza and 
differentially encoded motion vector (MV) data are 

10 received at terminal 740 and provided to a 

demultiplexer (DEMUX) 742. The encoded video image 
data is typically differentially encoded in DCT 
transform coefficients as a prediction error signal 
(e.g., residue) . 

IS A shape decoding function 744 processes the 

data when the VOP has an arbitrary shape to recover 
shape information, which is, in turn, provided to a 
motion compensation function 7 SO and a VOP 
reconstruction function 752. A texture decoding 

2 0 function 74 6 performs an inverse DCT on transform 

coefficients to recover residue information. For 
INTRA coded macroblocks (MBs) , pixel information is 
recovered directly and provided to the VOP 
reconstruction function 752. 

25 For INTER coded blocks and MBs , such as those 

in B-VOPs, the pixel information provided from the 
texture decoding function 746 to the reconstructed 
VOP function 752 represents a residue between the 
current MB and a reference image. The reference 

30 image may be pixel data from a single anchor MB 

which is indicated by a forward or backward MV. 
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Alternatively, for an interpolated (e.g., averaged) 
MB, the reference itnaee is an average of pixel data 
frcm -wo reference MBs , e.g., one past anchor MB and 
cne future anchor MB. In this case, the decoder 
must calculate the averaged pixel data according to 
the forward and backward MVs before recover inc the 
current MB pixel data. 

For INTER coded blocks and MBs, a motion 
decoding function 743 processes the encoded MV data 
to recover th"e differential MVs and provide them to 
the motion compensation function 750 and to a motion 
vector memory 749, such as a RAM . The motion 
compensation function 753 receives the differential 
MV data and determines a reference motion vector 
(e.g., predictor motion vector, or PMV) in 
accordance with the present invention. The PMV is 
determined according to the coding mode (e.g., 
forward, backward, bi-directional, or direct) . 

Once the rno ~ ion compensation function 75 0 
determines a full reference MV and sums it with the 
differential MV of the current MB, the full MV of 
the current MB is available. Accordingly, the 
motion compensation function 750 can now retrieve 
anchor frame best match data from a VOP memory 7 54, 
such as a RAM, calculate an averaged image if 
required, and provide the anchor frame pixel data to 
the VOP reconstruction function to reconstruct the 
current MB . 

The retrieved or calculated best match data is 
added back to the pixsl residue at the VOP 
reconstruction function 752 to obtain the decoded 
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20 
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current MB or block. The reconstructed block is 
output as a video output signal and also provided :o 
the VOP memory 754 to provide new anchor frame data. 
Note that an appropriate video data buffering 
capability may be required depending on the frame 
transmission and presentation orders since an anchor 
frame for a B-VOP MB may be a temporally future 
frame or field, in presentation order. 

FIG. 8 illustrates a MB packet structure in 
accordance with the present invention. ^ The ( ! 
structure is suitable for B-VOPs, and indicates the 
format of data received by the decoder. Note that 
the packets are shown in four rows for convenience 
only. The packets are actually transmitted 
serially, starting from the top row, and from left 
to right within a row. The first row 81G includes 
fields firs t_shape_code , MVD__shape, CR, ST and BA.C . 
A second row 830 includes fields MODB and MB TYPE . A. 
third row 8 50 includes fields CBPB, DQXJANT, 
lnterlaced_inf ormation, MVD £ , MVD b , and M VDB . A 
fourth row includes fields CODA, CBPB A, Alpha 31ock 
Data and Block Data. Each of the above fields i3 
defined according to MPEG-4 VM 8.0. 

f irst_shape__ccde indicates whether a MB is in a 
bounding box of a VOP. CR indicates a conversion ( 
ratio for Binary Alpha Blocks. ST indicates a 
horizontal or vertical scan order. BAC refers to a 
binary arithmetic- codeword. 

MODB , which indicates the mode of a MB, is 
present for every coded (non- skipped) MB in a B-VOP. 
Difference motion vectors (MVD £ , MVD b , or MVDB) and 
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CBPB are present if indicated by MODB . Macroblcck 
type is indicated by MB TYPE , which also signals 
motion vector modes (MVDs) and quantization 
(DQUAOT) . With interlaced mode, there can be up to 
5 four MVs per MB. MB TYPE indicates the coding type, 

e.g., forward, backward, bi - direct ional or direct. 
CBPB is the Coded Block: Pattern for a B-type 
mac rob lock. CBPBA is similarly defined as CBPB 
except that it has a maximum of four bits. DQUANT 

10 defines changes in the value of a quantizer. 

The field Interlaced^information in the third 
row 850 indicates whether a MB is interlaced coded, 
and provides field MV reference data which informs 
the decoder of the coding mode of the current MB or 

15 block. The decoder uses this information in 

calculating the MV for a current MB. The 
lnterlaced_inf ormation field may be stored for 
subsequent use as required in the MV memory 749 or 
other memory in the decoder. 

20 The lnterlacec_inf ormation field may also 

include a flag dct_type which indicates whether top 
and bottom field pixel lines in a field coded MB are 
reordered from the interleaved order, as discussed 
above in connection with FIG. 5. 

25 The MB layer structure shown is used when 

VOP_prediction_type== 10. If COD indicates sJcipped 
{COD == "i") for a MB in the nos: recently decoded 
I- or p-vop then the co-located {e.g., co-sited) MB 
in the B-VOP is also skipped. That is, no 

30 information is included in the bitstrean. 
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MVD e is the motion vector of a MB in B-VOP with 
respect to a temporally previous reference VOP (an 
I- or a P-VOP) . It consists of a variable length 
codeword for the horizontal component followed by a 
5 variable length codeword for the vertical component. 

For an interlaced B-VOP MB with f ield_prediction of 
"1" and MBTYPE of forward or interpolate. MVD f 
represents a pair cf field motion vectors (top field 
followed by bottom field) which reference the past 

10 anchor VCP . 

MVD b is the motion vector of a MB in B-VOP with 
respect to temporally following reference VOP (an 
I- or a P-VOP) . It consists of a variable* length 
codeword for the horizontal component followed by a 

15 • variable length codeword for the vertical component. 

For an interlaced B-VOP MB with field_pr edict ion of 
"l" and MBTYPE of backward or interpolate, MVTX, 
represents a pair of field MVs (top field followed 
by bottom field) which reference the future anchor 

20 VOP . 

MVD3 is only present in B-VOPs if direct mode 
is indicated by MODE and M3TYPE, and consists of a 
variable length codeword for the horizontal 
component followed by a variable length codeword for 

25 the vertical component of each vector. MVDSs 

represents delta vectors thai are used to correct B- 
VO? MB motion vectors which are obtained by scaling 
P-VOP MB motion vectors. 

CODA refers to gray scale shape coding. 

3 0 The arrangement shown in FIG. 8 is an example 

only and that various other arrangements for 
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communicating the relevant information to the 
decoder will become apparent to those skilled in the 
art . 

A bit stream syntax and MB layer syntax for use 
in accordance with the present invention is 
described in MPEG-4 VM 8.0 as well as the Eifrig et 
al . application referred to previously. 

Accordingly, it can be seen that the present 
invention provides a scheme for encoding a current 
MB in a B-VOP, in particular, when the current MB is 
field ceded, and/or an anchor MB is field ceded.. A 
scheme for direct coding for a field coded MB is 
presented, in addition to a coding decision process 
which uses the minimum of sum of absolute 
differences terms to select an optimum mode. A 
prediction motion vector (PMV) is also provided for 
the top and bottom field of a field codec current 
MB, including forward and backward PMVs as required, 
as well as for frame coded. MBs. 

Although the invention has been described in 
connection with various specific embodiments, those 
skilled in the art will appreciate that numerous 
adaptations and modifications may be made thereto 
without departing from the spirit and scope of the 
invention as set forth in the claims. 



4 .Brief Descriprion of Drawings 
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FIG. l is an illustration of a video object 
plane (VOP) coding and decoding process in 
accordance with the present invention. 
5 FIG. 2 is a block diagram of an encoder in 

accordance with the present invention. 

FIG. 3 illustrates an interpolation scheme for 
a half-pixel search. 

FIG. 4 illustrates direct mode coding of the 
10 top field of an interlaced- coded B-VOP in accordance 

with the present invention. 

FIG. 5 illustrates direct mode coding of the 
bottom field of an interlaced- coded B-VOP in 
accordance with the present invention. 
IS FIG. 6 illustrates reordering of pixel lines in 

an adaptive frame/field prediction scheme in 
accordance with the present invention. 

FIG. 7 is a block diagram of a decoder in 
accordance with the present invention. 
20 FIG. 8 illustrates a macroblock layer structure 

in accordance with the present invention. 
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1. Abstract 



A system for coding of digital video images 
5 such as bi-directionally predicted video object 

planes (B-VOPs) (420) , in particular, where the 
B-VOP and/or a reference image (400,440) used to 
code the B-VOP is interlaced coded. For a B-VOP 
rnacroblock (420) which is co-sited with a field 

10 predicted rnacroblock of. a future anchor picture 

(440) , direct mode prediction is made by calculating 
four field motion vectors (MV. ftop , MV fiboc , MV b<to?/ 
JW^bot) / then generating the prediction rnacroblock. 
The four field motion vectors and their reference 

15 fields are determined from (1) an offset term 

(MV D ) of the current rnacroblock' s coding vector, (2) 
the two future anchor picture field motion vectors 
(MV- 0 p, MV^) , (3) the reference field (405,410) used 
by the two field motion vectors of the co- sited 

2 0 future anchor rnacroblock , and (4) the temporal 

spacing <TR* ttcp , TR b boc , TR D/Copi TR o boc ) , in field 
periods, between the current B-VOP fields and the 
anchor fields. Additionally, a coding mode decision 
process for the current MB selects a forward, 

25 backward, or average field coding mode according to 

a minimum sum of absolute differenc es (SAD) error 
which is obtained over the top (430) and bottom 
(42 5) fields of the current MB (420) . 



2.Representative Drawing 
Figure 1 
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